PostgreSQL
The world's most advanced open source database

..::Planet PostgreSQL::..

February 08, 2010

Dave Page

FOSDEM 2010

Well, FOSDEM 2010 is now over and it all seemed to go pretty well. The PostgreSQL Project was represented well (as one would hope for our second biggest annual European gathering), with the majority of the Hotel Agenda Louise seemingly occupied by database geeks.

On the Friday night we had a database dinner at Les Brasseurs de la Grand Place where we were joined by Sergey Petrunya and Kristian Nielsen from MariaDB. Good conversation, good beer and good food followed later by an aborted attempt to join the FOSDEM Beer Event at Cafe Delirium (it was just too busy) and a successful landing at the Irish pub a short walk form the Hotel.


Saturday morning was the start of the conference itself. We'd hoped to scrounge an extra table (much needed, with the number of people and the amount of swag we had), but unfortunately that didn't work out. Somehow, we managed to squeeze onto one.


The talks started at 1PM in our dev room (which we only had for Saturday afternoon unfortunately). Magnus and Jean-Paul started with report on the state of PostgreSQL Europe, and then it was my turn with my talk on "Developments in PostgreSQL 9.0". Slides and the full schedule can be found on the PostgreSQL Wiki. The dev room was basically packed solid with people sitting on the floor for every talk. Please FOSDEM organisers - give us a bigger room next year!


Saturday night was a late dinner at possibly the slowest restaurant in the world, who clearly weren't expecting too many diners. Food wasn't too bad though, when it finally arrived.

Sunday was spent mostly on the booth and (in my case) in lots of ad-hoc meetings on topics such as the new PostgreSQL project infrastructure thats in development, as well as PGDay.EU 2010 which is being planned in Paris.

Most of us left at around 5PM, with Heikki, Greg, Magnus, Stefan and I heading for the airport. After pizza and a brief scare when Stefan lost his boarding pass, we left for home. Goodbye Brussels, see you next year.

More photos can be found here.

by Dave Page (noreply@blogger.com) at February 08, 2010 04:29 PM

Selena Deckelmann

Unlocking the clubhouse: cultural resistance and learning communities

I finished reading “Unlocking the clubhouse” on Saturday, finally. The book is only about 150 pages long, but it’s full of useful information about increasing participation of women in computer science.

The chapter that most stuck with me was chapter 6, “Persistence and Resistance: Staying in Computer Science.” I have said more than once, in a tongue-in-cheek way, that Code-n-Splode’s mantra for men who think that we should not have the “dude token” policy should be: “It’s just not about you.”

My feeling is that establishing a culture where female voices dominate, rather than are assimilated in, creates a social environment that’s fundamentally different. And that that difference is *good*. I wouldn’t say that the book totally supports that notion, but it points out situations where women found peer groups that did not conform to a male hacker stereotype, and that foundation of social support helped them stay in their course of study.

The students referred to in the paragraph are undergraduates at Carnegie Mellon University:

Women who accept the prevailing culture as the norm and who continuously compare themselves to this norm and find themselves coming up short are the ones who suffer the most.

The majority of women struggle to find a place where they can feel comfortable in the prevailing culture…

Ironically, it is in this area of relationship to culture that international women may have an edge. The international women do not as readily use the U.S. male hacker as their reference group. Since they are not fully part of this culture, their reference group is elsewhere. Many international students have alternative success norms and social bonds that protect them. Other priorities are dominant, and with these come other scales for self-evaluation.

So, rather than bringing their cultural norms to the hacker culture and modifying it, the international women have their own social structures which exist outside of the dominant culture. “Cultural resistance” was the title for this section, and it’s a great way of characterizing the lack of assimilation.

I have more than a few times heard women-specific groups discouraged because of they emphasize differences that the dominant culture feels should be unimportant. I’m interested in further research that discusses the effects of splinter groups, particularly when they are created for women.

The second interesting topic in this chapter concerned learning communities.

Former University of California calculus professor Uri Treisman (1992) believes that a supportive learning community is critically important for the success of minority students in math and science.

The story went on to describe Professor Treisman’s observation that Asian students tended to socialize *and* study in supportive groups, which tended to help students stick with the courses and get better grades. He established similar groups for Hispanic and African American students, and found across several universities and colleges that these groups helped retention. Our observations and the resulting user group for women mirrors that Professor’s experience.

There’s a special connection created when you live and engage with material in a supportive learning community. They take time to create, and are a bit harder to maintain outside of an academic context (where life, work and diverging interests can be a bit more challenging to coordinate).

Code-n-splode has been fairly quiet about its successes, but I think now is the time for us to start talking a bit more about how well the group has succeeded.

Photo courtesy of DrPantzo under a Creative Commons License.

by selena at February 08, 2010 04:07 PM

Magnus Hagander

Time management by (somebody elses) press-releases

A while back I submitted a couple of talks for PG-East 2010 in Philly, and over the past couple of weeks I've been nagging the organizers semi-frequently to get some pre-info on whether I've been accepted or not, since flight prices started to climb fairly rapidly. The site clearly says information that the information will be available on Feb 15th, so I can't really complain that the answer kept being "don't know yet".

A couple of days ago, I got a note from Dave pinged me with a message asking if I was approved. Turns out this press-release had been posted (by his company, no less). Which explicitly names me as a speaker at the conference.

Took me two more days of chasing down JD, but I now have confirmation I'll be there. I don't actually know what I'll be speaking about, but it's a pretty safe bet it will be PostgreSQL related.

I call this Time management by press releases. If I could only get it to apply to all meetings, I would no longer need to keep my own calendar up to date.

So, I'll see you in Philly!

by nospam@example.com (Magnus Hagander) at February 08, 2010 04:07 PM

Robert Gravsjö

OT: Django 1.2 on track

A little off topic but I'm very pleased to see that the timetable for Django 1.2 seems to be holding since they just entered feature freeze with beta 1. Among my favorite of features to come is, of course, support for multiple database backends. Other ...

February 08, 2010 08:24 AM

February 07, 2010

Andreas Scherbaum

PostgreSQL @ FOSDEM 2010

Author
Andreas 'ads' Scherbaum
The Open Source people have met again, like every year, in Brussels for the biggest open source event in europe - FOSDEM. The European PostgreSQL User Group was present with a booth and a devroom, plus a lot "community".


Continue reading "PostgreSQL @ FOSDEM 2010"

by nospam@example.com (Andreas 'ads' Scherbaum) at February 07, 2010 10:55 PM

Josh Berkus

Postgres needs a new Python driver

Per discussion on the pgsql-hackers mailing list this month, the PostgreSQL database really needs a new (or dramatically improved) Python driver. I have been only too aware of this due to the amount of Django work my team is doing lately. Let me sum up the situation so that you don't have to comb all the way through that thread. Hopefully at least one person reading this will take it as a call to action.

February 07, 2010 03:17 PM

David Fetter

PostgreSQL Weekly News vs. Casa Donde

This week's PostgreSQL Weekly news may be rescheduled to Wednesday February 10, evening PST.

Thanks very much to whatever component failed at my home for this.

by nospam@example.com (David Fetter) at February 07, 2010 09:32 AM

Leo Hsu and Regina Obe

PostGIS 1.5.0 out and PLR working on Windows 8.3-8.4 installs

PostGIS 1.5.0 is finally out

I'm happy to report that after a long haul, we have finally released PostGIS 1.5.0. Two months late, but there it is, and its a really great release I think; Perhaps the best release ever.

Details on what makes this release so special. The geodetic support.

Summary excerpted from Paul's slightly premature announcement

February 4, 2010

The PostGIS development team has, after a long course of reflection
and a detailed self-examination of our various personal failings,
decided to release PostGIS 1.5.0 to the public.

  http://postgis.org/download/postgis-1.5.0.tar.gz

This new version of PostGIS includes a new "geography" type for
managing geodetic (lat/lon) data, performance-enhanced distance
calculations, GML and KML format readers, an improved shape loading
GUI, and other new features as well.

Especial thanks to:

* Dave Skea for algorithms and mathematics necessary to support
spherical geometry
* Nicklas Avén for the new performance enhanced distance calculations
and other distance-related functions
* Sandro Santilli for new buffering features (end caps and style options)
* Olivier Courtin for GML/KML input functions
* Guillaume Lelarge for support for the upcoming PgSQL 9.0
* George Silva for an example implementation of history tables
* Vincent Picavet for Hausdorff distance calculations
* The maintainers of GEOS, Proj4, and LibXML, without whom we would
have less of a spatial database

Love, the PostGIS project steering committee,

Mark Cave-Ayland
Kevin Neufeld
Regina Obe
Paul Ramsey


Continue reading "PostGIS 1.5.0 out and PLR working on Windows 8.3-8.4 installs"

by nospam@example.com (Leo Hsu and Regina Obe) at February 07, 2010 05:58 AM

February 06, 2010

Dan Langille

Graphing

I have about 14,000 data points collected from backups. I started in Oct 2007, but have not done anything with the data. I’d like your help. I have put the data into a PostgreSQL database and dumped it. The dump is here: http://langille.org/tmp/dlt_stats.sql.gz The table looks like: $ psql dan psql (8.4.2) Type "help" for help. dan=# \d dlt_stats [...]

by Dan at February 06, 2010 06:04 PM

Andrew Dunstan

Damn it feels good to be a (perl) gangsta

The recent debate Bruce Momjian has started on the confusing mess that is the Python drivers for PostgreSQL made me feel very glad I just don't use any of them. As Just Another Perl Hacker, I am so glad that we have a good, well defined and fairly stable database API for Perl, namely DBI, and a pretty darn good implementation for PostgreSQL in DBD::Pg.

Add to these some of the higher end modules like DBIx::Class for use in frameworks and ORMs, and SQL::Translator for dealing with multiple database flavors, and Perl programmers doing database work are in pretty good shape.

by nospam@example.com (Andrew Dunstan) at February 06, 2010 02:33 PM

Endpoint Team

PostgreSQL version 9.0 release date prediction

So when will PostgreSQL version 9.0 come out? I decided to "run the numbers" and take a look at how the Postgres project has done historically. Here's a quick graph showing the approximate number of days each major release since version 6.0 took:

Some interesting things can be seen here: there is a rough correlation between the complexity of a new release and the time it takes, major releases take longer, and the trend is gradually towards more days per release. Overall the project is doing great, releasing on average every 288 days since version 6. If we only look at version 7 and onwards, the releases are on average 367 days apart. If we look at *just* version 7, the average is 324 days. If we look at *just* version 8, the average is 410. Since the last major version that came out was on July 1, 2009, the numbers predict 9.0 will be released on July 3, 2010, based on the version 7 and 8 averages, and on August 15, 2010, based on just the version 8 averages. However, this upcoming version has two very major features, streaming replication (SR) and hot standby (HS). How those will affect the release schedule remains to be seen, but I suspect the 9.0 to 9.1 window will be short indeed.

As a recap, the Postgres project only bumps the first part of the version number for major changes (Although many, myself included, would argue that 7.4 was such a major jump it should have been called 8.0). The second number occurs anytime a "new release" happens, and means new features and enhancements. The final number, the revision, is only incremented for security and bug fixes, and is almost always a 100% binary compatible drop in for the previous revision in the branch. (What's the average (mean) days between revisions? 84 days since version 6, and 88 days since version 7. The medians are 84 and 87 respectively.)

How busy were those periods? Here's the number of commits per release period. Note that I said release period, not release, as commits are still being made to old branches, although this is a very small minority of the commits, so I did not bother to break it down at that level.

There is a strong correlation with the previous chart. Of note is version 8.1, which had few commits and was released relatively quickly. Also note that version 8.0 is still winning as far as the sheer number of commits, most likely due to the fact that native Windows support was added in that version.

Some other items of interest from the data:

  • There have been roughly 140,000 commits from version 6.0 to 8.4.2.
  • There have been 32 CVS committers since the start of the project (and of course, many hundreds of others whose work was funnelled through those committers)
  • The mean number of commits per person is 4383, but the distribution is very skewed: Bruce, Peter, and Tom account for 80% of all commits, with the mean between them of 37,000 commits.
  • Commits changed about 40 lines on average.

Alright, two final charts: commits per time periods. I'll let the data speak for itself this time. Stay tuned for future blog posts exploring this data further!

by Greg Sabino Mullane (noreply@blogger.com) at February 06, 2010 08:10 AM

Peter Eisentraut

My Favorite PostgreSQL 9.0 Feature

My favorite PostgreSQL 9.0 feature does not have a two-letter acronym. It's the new bytea format, available since 8.5alpha1.

At F-Secure Labs, as you might imagine, we store information about a bunch of malware samples. Throughout the computer security industry, file samples, malware or perhaps not, are referred to by a hash value, such as MD5, SHA1, or SHA256. The typical representation of such hash values in most programming environments and also in prose and literature is the hexadecimal format, for example da39a3ee5e6b4b0d3255bfef95601890afd80709. Except when you want to write a test case against PostgreSQL or want to track down a problem, you'll be looking for \3329\243\356^kK\0152U\277\357\225`\030\220\257\330\007\011, also known as \\3329\\243\\356^kK\\0152U\\277\\357\\225`\\030\\220\\257\\330\\007\\011 in some contexts.

Well, that's over now; it will show as \xda39a3ee5e6b4b0d3255bfef95601890afd80709. You will still need to take care of the backslash, but that will surely be resolved when standard_conforming_strings is turned on in version 10.0 ;-), or we implement a new, SQL-standard conforming binary string type without legacy syntax issues.

By the way, the actual origin of this feature idea was a performance problem, reported by Bernd Helmle. The new format is quite a bit faster to encode and decode: In some internal tests, pg_dump of tables with mostly binary data was twice as fast and created a dump file that was half the size with the new format compared to the old format. So hopefully everyone wins.

Another new feature in the same area, by the way, is that PL/Python now supports bytea values sanely, contributed by Caleb Welton.

by Peter Eisentraut (peter@eisentraut.org) at February 06, 2010 01:19 AM

February 05, 2010

Dan Langille

Big storage

I’m thinking about creating a storage box. One with a lot of disk space. It will be used for backups and stuff. I will continue to backup to tape as well. I’m considering this case, which has 8xSATA hot-swap bays: http://www.newegg.com/Product/Product.aspx?Item=N82E16811192058 If I was buying a RAID card, I’d buy this one: http://www.newegg.com/Product/Product.aspx?Item=N82E16816131002&Tpk=areca%201120 But I may go with [...]

by Dan at February 05, 2010 05:59 PM

David Fetter

FOSDEM Starting

The herd of blue elephants is gathering for FOSDEM, including several talks in the main track.

I'd like to thank the FOSDEM for making it possible for me to attend this year.

More on this soon...

by nospam@example.com (David Fetter) at February 05, 2010 07:10 AM

February 04, 2010

Bulgarian PUG

Using G-WAN C Servlets with PostgreSQL under Win32

I have posted an example how to use G-WAN C Servlets with PostgreSQL. Check it out.

http://forum.trustleap.com/web/g-wanforum.nsf/viewTopic.xsp?topicID=14755geq70r28

G-WAN is a small-footprint (100 KB) Web server with C scripts executed in real-time (no compiler needed). G-WAN C scripts are as fast as static pages. It is much, much faster than Apache and IIS. More about it here

Peter

by Peter at February 04, 2010 07:14 PM

Bruce Momjian

Caring for Your (PG) Introvert

Many computer programmers are introverts because they have chosen the solitary job of creating programs that accomplish specific tasks. Open source developers are probably even more prone to be introverts because the (boring?) meetings, office chit-chat, and lunch outings are mostly gone, with the only communication being via email, instant messaging, twitter, and blogs. Open source social interaction is certainly more controlled than traditional communication.

Read More »

February 04, 2010 04:15 PM

Simon Riggs

Parallel Query (1)

I recently returned from a lunch meeting of the UK ex-Teradatans to see old friends and colleagues. Some people know that I spent time with Teradata when it was in startup mode, what seems like a very long time ago now. Anyway, that's left me with good knowledge and interest in parallel database systems. And that's why I know Greenplum's Chief Architect Chuck McDevitt and hence why I've been using Greenplum on and off since 2005. Greenplum have also funded some of the developments I've done for PostgreSQL.

I'm disappointed we've not made much progress with parallel operations and partitioning in core Postgres in last few releases. Recent Greenplum results show we have much work to do in improving things. http://community.greenplum.com/showthread.php?t=113
Some people may think I should be sad at that, though the way I see it, Greenplum is very close to being PostgreSQL. It just happens to have some good performance enhancements of great use in Data Warehousing. A few other enhanced versions of Postgres exist also.

Some other recent results also show that MonetDB and Infobright don't fare any better by comparison either.
http://community.greenplum.com/showthread.php?t=111

Having seen the above results I'm thinking about projects for next release now. Anybody want to fund some additional Data Warehousing features in Postgres core? I'm determined that next release we will get Bitmap Indexes in core, at least.

There's some more to discuss on parallel query, such as "How does this all relate to Hot Standby?", so I'll follow up later with another blog.

by noreply@blogger.com (Simon Riggs) at February 04, 2010 02:18 PM

Endpoint Team

LCA2010: Postgres represent!

I had the pleasure of attending and presenting at LinuxConf.AU this year in Wellington, NZ. Linux Conf.AU is an institution whose friendliness and focus on the practical business of creating and sustaining open source projects was truly inspirational.

My talk this year was "A Survey of Open Source Databases", where I actually created a survey and asked over 35 open source database projects to respond. I have received about 15 responses so far, and also did my own research on the over 50 projects I identified. I created a place-holder site for my research at: ossdbsurvey.org. I'm hoping to revise the survey (make it shorter!!) and get more projects to provide information.

Ultimately, I'd like the site to be a central location for finding information and comparing different projects. Performance of each is a huge issue, and there are a lot of individuals constructing good (and bad) systems for comparing. I don't think I want to dive into that pool, yet. But I would like to start collecting the work others have done in a central place. Right now it is really far too difficult to find all of this information.

Part of the talk was also a foray into the dangerous world of classification. I tried to put together basic categories, based on conversations with individual developers and some fine-tuning with Josh Berkus. Josh gave a short overview of database models during "Relational vs Non-relational" in the Data Storage mini-conf, and we collaborated some on category definition. I also saw Devdas Bhagat give a use case talk on using Postgres, yet again confirming how wonderful transactional DDL is for developers. I also gave a lightning talk (WITHOUT SLIDES!) on Bucardo at the tail end of the Data Storage mini-conf.

Josh Berkus, during "PostgreSQL Development Today", announced to the world that the new version of Postgres would be version 9.0! And he did a live demonstration of streaming replication and hot standby. The audience seemed pleased.

I was delighted to see representatives from the Postgres community on the main stage of the conference three times during LCA!

And finally, I had the pleasure of participating in the Friday keynote lightning talks. I kicked things off by telling the story of the elections in Ondo State, Nigeria, in 5 minutes. I saw that one of the IT people I met while in Akure was now helping Osun state investigate and correct election fraud in January. So glad to see that their good work continues!

by Selena Deckelmann (noreply@blogger.com) at February 04, 2010 12:44 PM

February 02, 2010

Andrew Dunstan

Gitting it better now

Well, with some help from some comments on this blog and also from rereading this message from Aidan Van Dyk and understanding it a lot more this time, I think I have a handle on how the buildfarm should support Git repositories.

When the branch copy (which will be called [branchname]/pgsql, just as it is with CVS) exists, the buildfarm client will do an update by simply calling git pull. Where it does not exist, the client will create it by calling git clone followed by git checkout -t origin/[branchname] -b bf_[branchname].

This part is already working with my test buildfarm animal, in fact.

What I will add is support for an additional git_references config parameter, which can be used as part of the clone command. By default it will not be there, in which case the user would get a plain clone of the configured master repo. Setting up the references repo would be something documented but not done by the script.

One of the things that is mildly annoying from a developer perspective (it won't really affect buildfarm users) is that, as I understand it, when using Aidan's recommendation, getting updates into your working copy will require two commands rather than one (git fetch on your local master followed by git pull on your working copy). I guess that can be aliased, and it's something I will probably do pretty soon.

by nospam@example.com (Andrew Dunstan) at February 02, 2010 11:17 PM

US PostgreSQL Association

Seattle PostgreSQL Users's Group

JD Wrote:

Last West, a sign up sheet was sent out for people who were interested in starting a Seattle PostgreSQL group. I am please to say that this group has started. Their first meeting is February 9th, 2010. I have the honor of being their first speaker. I will be speaking on Dumb Simple PostgreSQL Performance, which is the same talk I have been giving at various user groups around Seattle (Python, Django, Perl). You can check out there website here.

by jd at February 02, 2010 06:38 PM

Hubert Lubaczewski

Waiting for 9.0 – removal of 0000000001.history check

I tend to write about new features in new versions of PostgreSQL, but this patch actually fixes one of the things that annoy me a lot, so here it goes: On 26th of January, Simon Riggs committed: Log Message: ----------- Fix longstanding gripe that we check for 0000000001.history at start of archive recovery, even when we know it is never [...]

by depesz at February 02, 2010 02:35 PM

Endpoint Team

Automatic migration from Slony to Bucardo

About a month ago, Bucardo added an interesting set of features in the form of a new script called slony_migrator.pl. In this post I'll describe slony_migrator.pl and its three major functions.

The Setup

For these examples, I'm using the pagila sample database along with a set of scripts I wrote and made available here. These scripts build two different Slony clusters. The first is a simple one, which replicates this database from a database called "pagila1" on one host to a database "pagila2" on another host. The second is more complex. Its one master node replicates the pagila database to two slave nodes, one of which replicates it again to a fourth slave using Slony's FORWARD function as described here. I implemented this setup on two FreeBSD virtual machines, known as myfreebsd and myfreebsd2. The reset-simple.sh and reset-complex.sh scripts in the script package I've linked to will build all the necessary databases from one pagila database and do all the Slony configuration.

Slony Synopsis

The slony_migrator.pl script has three possible actions, the first of which is to connect to a running Slony cluster and print a synopsis of the Slony setup it discovers. You can do this safely against a running, production Slony cluster; it gathers all its necessary information from a few simple Slony queries. Here's the synopsis the script writes for the simple configuration I described above:

josh@eddie:~/devel/bucardo/scripts$ ./slony_migrator.pl -db pagila1 -H myfreebsd
Slony version: 1.2.16
psql version: 8.3
Postgres version: 8.3.7
Slony schema: _pagila
Local node: 1
SET 1: All pagila tables
* Master node: 1  Active: Yes  PID: 3309  Comment: "Cluster node 1"
  (dbname=pagila1 host=myfreebsd user=postgres)
  ** Slave node:  2  Active: Yes  Forward: Yes  Provider:  1  Comment: "Node 2"
     (dbname=pagila2 host=myfreebsd2 user=postgres)

The script has reported the Slony, PostgreSQL, and psql versions, the Slony schema name, and shows that there's only one set, replicated from the master node to one slave node, including connection information for each node. Here is the output of the same action, run against the complex slony setup. Notice that node 3 has node 2 as its provider, not node 1:

josh@eddie:~/devel/bucardo/scripts$ ./slony_migrator.pl -db pagila1 -H myfreebsd
Slony version: 1.2.16
psql version: 8.3
Postgres version: 8.3.7
Slony schema: _pagila
Local node: 1
SET 1: All pagila tables
* Master node: 1  Active: Yes  PID: 3764  Comment: "Cluster node 1"
  (dbname=pagila1 host=myfreebsd  user=postgres)
  ** Slave node:  2  Active: Yes  Forward: Yes  Provider:  1  Comment: "Cluster node 2"
     (dbname=pagila2 host=myfreebsd2 user=postgres)
  ** Slave node:  3  Active:  No  Forward: Yes  Provider:  2  Comment: "Cluster node 3"
     (dbname=pagila3 host=myfreebsd2 user=postgres)
  ** Slave node:  4  Active: Yes  Forward: Yes  Provider:  1  Comment: "Cluster node 4"
     (dbname=pagila4 host=myfreebsd  user=postgres)

This is a simple way to get an idea of how a Slony cluster is organized. Again, we can get all this without downtime or any impact on the Slony cluster.

Creating Slonik Scripts Automatically

Slony gets its configuration entirely through scripts passed to an application called Slonik, which writes configuration entries into a Slony schema within a replicated database. At least as far as I know, however, Slony doesn't provide a way to regenerate those scripts based on the contents of that schema. The slony_migrator.pl script will do that for you with the --slonik option. For example, here is the Slonik script it generates for the simple configuration:

josh@eddie:~/devel/bucardo/scripts$ ./slony_migrator.pl -db pagila1 -H myfreebsd --slonik
CLUSTER NAME = pagila;
NODE 1 ADMIN CONNINFO = 'dbname=pagila1 host=myfreebsd user=postgres';
NODE 2 ADMIN CONNINFO = 'dbname=pagila2 host=myfreebsd2 user=postgres';
INIT CLUSTER (ID = 1, COMMENT = 'Cluster node 1');
STORE NODE (ID = 2, EVENT NODE = 1, COMMENT = 'Node 2');
STORE PATH (SERVER = 1, CLIENT = 2, CONNINFO = 'dbname=pagila1 host=myfreebsd user=postgres', CONNRETRY = 10);
STORE PATH (SERVER = 2, CLIENT = 1, CONNINFO = 'dbname=pagila2 host=myfreebsd2 user=postgres', CONNRETRY = 10);
ECHO 'Please start up replication nodes here';
TRY {
    CREATE SET (ID = 1, ORIGIN = 1, COMMENT = 'All pagila tables');
} ON ERROR {
    EXIT -1;
}
SET ADD TABLE (ID = 6, ORIGIN = 1, SET ID = 1, FULLY QUALIFIED NAME = 'public.customer', KEY = 'customer_pkey', COMMENT = 'public.customer');
SET ADD TABLE (ID = 11, ORIGIN = 1, SET ID = 1, FULLY QUALIFIED NAME = 'public.language', KEY = 'language_pkey', COMMENT = 'public.language');
--- snip ---
SET ADD SEQUENCE (ID = 13, ORIGIN = 1, SET ID = 1, FULLY QUALIFIED NAME = 'public.store_store_id_seq', COMMENT = 'public.store_store_id_seq');
SET ADD SEQUENCE (ID = 10, ORIGIN = 1, SET ID = 1, FULLY QUALIFIED NAME = 'public.payment_payment_id_seq', COMMENT = 'public.payment_payment_id_seq');
SET ADD SEQUENCE (ID = 5, ORIGIN = 1, SET ID = 1, FULLY QUALIFIED NAME = 'public.country_country_id_seq', COMMENT = 'public.country_country_id_seq');
SUBSCRIBE SET (ID = 1, PROVIDER = 1, RECEIVER = 2, FORWARD = YES);

The pagila database contains many tables and sequences, and I've removed the repetitive commands to tell Slony about all of them, for the sake of brevity, but in its original form, the code above would rebuild the simple Slony cluster exactly, and can be very useful for getting an idea of how an otherwise unknown cluster is configured. I won't promise the Slonik code is ideal, but it does recreate a working cluster. The more complex Slonik output is very similar, differing only in how the sets are subscribed. Here I'll show only the major differences, which are the commands required to create the more complex Slony subscription scheme. In the downloadable script package I mentioned above, this subscription code is somewhat more complex, specifically because Slony won't let you subscribe node 3 to updates from node 2 until node 2 is fully subscribed itself. The slony_migrator.pl script isn't smart enough on its own to add necessary WAIT FOR EVENT Slonik commands, but it does get most of the code right, and, importantly, creates the subscriptions in the proper order.

SET ADD SEQUENCE (ID = 10, ORIGIN = 1, SET ID = 1, FULLY QUALIFIED NAME = 'public.payment_payment_id_seq', COMMENT = 'public.payment_payment_id_seq');
SET ADD SEQUENCE (ID = 5, ORIGIN = 1, SET ID = 1, FULLY QUALIFIED NAME = 'public.country_country_id_seq', COMMENT = 'public.country_country_id_seq');
SUBSCRIBE SET (ID = 1, PROVIDER = 1, RECEIVER = 4, FORWARD = YES);
SUBSCRIBE SET (ID = 1, PROVIDER = 1, RECEIVER = 2, FORWARD = YES);
SUBSCRIBE SET (ID = 1, PROVIDER = 2, RECEIVER = 3, FORWARD = YES);

Migrating Slony Clusters to Bucardo

The final slony_migrator.pl option will create a set of bucardo_ctl commands to create a Bucardo cluster to match an existing Slony setup. Although Bucardo can be configured by directly modifying its configuration database, a great deal of work of late has gone into making configuration easier through the bucardo_ctl program. Here's the output from slony_migrator.pl on the simple Slony cluster. Note the --bucardo command-line option, which invokes this function:

josh@eddie:~/devel/bucardo/scripts$ ./slony_migrator.pl -db pagila1 -H myfreebsd --bucardo
./bucardo_ctl add db pagila_1 dbname=pagila1  host=myfreebsd user=postgres
./bucardo_ctl add db pagila_2 dbname=pagila2  host=myfreebsd2 user=postgres
./bucardo_ctl add table public.customer db=pagila_1 ping=true standard_conflict=source herd=pagila_node1_set1
./bucardo_ctl add table public.language db=pagila_1 ping=true standard_conflict=source herd=pagila_node1_set1
./bucardo_ctl add table public.store db=pagila_1 ping=true standard_conflict=source herd=pagila_node1_set1
./bucardo_ctl add table public.category db=pagila_1 ping=true standard_conflict=source herd=pagila_node1_set1
./bucardo_ctl add table public.film db=pagila_1 ping=true standard_conflict=source herd=pagila_node1_set1
--- snip ---
./bucardo_ctl add sequence public.city_city_id_seq db=pagila_1 ping=true standard_conflict=source herd=pagila_node1_set1
./bucardo_ctl add sequence public.store_store_id_seq db=pagila_1 ping=true standard_conflict=source herd=pagila_node1_set1
./bucardo_ctl add sequence public.payment_payment_id_seq db=pagila_1 ping=true standard_conflict=source herd=pagila_node1_set1
./bucardo_ctl add sequence public.country_country_id_seq db=pagila_1 ping=true standard_conflict=source herd=pagila_node1_set1
./bucardo_ctl add sync pagila_set1_node1_to_node2 source=pagila_node1_set1 targetdb=pagila_2 type=pushdelta

The Bucardo model of a replication system differs from Slony, but the two match fairly closely, especially for a simple scenario like this one. But slony_migrator.pl will work for the more complex Slony example I've been using, shown here:

josh@eddie:~/devel/bucardo/scripts$ ./slony_migrator.pl -db pagila1 -H myfreebsd --bucardo
./bucardo_ctl add db pagila_1 dbname=pagila1  host=myfreebsd user=postgres
./bucardo_ctl add db pagila_4 dbname=pagila4  host=myfreebsd user=postgres
./bucardo_ctl add db pagila_3 dbname=pagila3  host=myfreebsd2 user=postgres
./bucardo_ctl add db pagila_2 dbname=pagila2  host=myfreebsd2 user=postgres
./bucardo_ctl add table public.customer db=pagila_1 ping=true standard_conflict=source herd=pagila_node1_set1
./bucardo_ctl add table public.language db=pagila_1 ping=true standard_conflict=source herd=pagila_node1_set1
./bucardo_ctl add table public.store db=pagila_1 ping=true standard_conflict=source herd=pagila_node1_set1
--- snip ---
./bucardo_ctl add sequence public.payment_payment_id_seq db=pagila_1 ping=true standard_conflict=source herd=pagila_node1_set1
./bucardo_ctl add sequence public.country_country_id_seq db=pagila_1 ping=true standard_conflict=source herd=pagila_node1_set1
./bucardo_ctl add sync pagila_set1_node1_to_node4 source=pagila_node1_set1 targetdb=pagila_4 type=pushdelta
./bucardo_ctl add sync pagila_set1_node1_to_node2 source=pagila_node1_set1 targetdb=pagila_2 type=pushdelta target_makedelta=on
./bucardo_ctl add table public.customer db=pagila_2 ping=true standard_conflict=source herd=pagila_node2_set1
./bucardo_ctl add table public.language db=pagila_2 ping=true standard_conflict=source herd=pagila_node2_set1
./bucardo_ctl add table public.store db=pagila_2 ping=true standard_conflict=source herd=pagila_node2_set1
--- snip ---
./bucardo_ctl add sequence public.store_store_id_seq db=pagila_2 ping=true standard_conflict=source herd=pagila_node2_set1
./bucardo_ctl add sequence public.payment_payment_id_seq db=pagila_2 ping=true standard_conflict=source herd=pagila_node2_set1
./bucardo_ctl add sequence public.country_country_id_seq db=pagila_2 ping=true standard_conflict=source herd=pagila_node2_set1
./bucardo_ctl add sync pagila_set1_node2_to_node3 source=pagila_node2_set1 targetdb=pagila_3 type=pushdelta

I mentioned the Bucardo data model differs from that of Slony. Slony contains a set of tables and sequences in a "set", and that Slony set remains a distinct object on all databases where those objects are found. Bucardo, on the other hand, has a concept of a "sync", which is a replication job from one database to one or more slaves (here I'm talking only about master->slave syncs, and ignoring for purposes of this post Bucardo's ability to do multi-master replication). This makes the setup slightly different for the more complex Slony scenario, in that whereas Slony has one set and different subscriptions, in Bucardo I need to define the tables and sequences involved in each of three syncs: one from node 1 to node 2, one from node 1 to node 4, and one from node 2 to node 3. I also need to turn on Bucardo's "makedelta" option for the node 1 -> node 2 sync, which is the Bucardo equivalent of the Slony FORWARD subscription option.

Migrating from Slony to Bucardo

This post is getting long, but for the sake of demonstration let's show a migration from Slony to Bucardo, using the more complex Slony example. First, I'll create a blank database, and install Bucardo in it:
josh@eddie:~/devel/bucardo$ createdb bucardo
josh@eddie:~/devel/bucardo$ ./bucardo_ctl install
This will install the bucardo database into an existing Postgres cluster.
Postgres must have been compiled with Perl support,
and you must connect as a superuser

We will create a new superuser named 'bucardo',
and make it the owner of a new database named 'bucardo'

Current connection settings:
1. Host:          /tmp
2. Port:          5432
3. User:          postgres
4. Database:      postgres
5. PID directory: /var/run/bucardo
Enter a number to change it, P to proceed, or Q to quit: 

I'll make the necessary configuration changes, and run the installation by following the simple menu.

Current connection settings:
1. Host:          /tmp
2. Port:          5432
3. User:          postgres
4. Database:      bucardo
5. PID directory: /home/josh/devel/bucardo/pid
Enter a number to change it, P to proceed, or Q to quit: p

Postgres version is: 8.3
Attempting to create and populate the bucardo database and schema
Database creation is complete

Connecting to database 'bucardo' as user 'bucardo'
Updated configuration setting "piddir"
Installation is now complete.

If you see any unexpected errors above, please report them to bucardo-general@bucardo.org

You should probably check over the configuration variables next, by running:
./bucardo_ctl show all
Change any setting by using: ./bucardo_ctl set foo=bar

Now I'll use slony_migrator.pl to get a set of bucardo_ctl scripts to build my Bucardo cluster:

josh@eddie:~/devel/bucardo/scripts$ ./slony_migrator.pl -db pagila1 -H myfreebsd --bucardo > pagila-slony2bucardo.sh
josh@eddie:~/devel/bucardo/scripts$ head pagila-slony2bucardo.sh 
./bucardo_ctl add db pagila_1 dbname=pagila1  host=myfreebsd user=postgres
./bucardo_ctl add db pagila_4 dbname=pagila4  host=myfreebsd user=postgres
./bucardo_ctl add db pagila_3 dbname=pagila3  host=myfreebsd2 user=postgres
./bucardo_ctl add db pagila_2 dbname=pagila2  host=myfreebsd2 user=postgres
./bucardo_ctl add table public.customer db=pagila_1 ping=true standard_conflict=source herd=pagila_node1_set1
./bucardo_ctl add table public.language db=pagila_1 ping=true standard_conflict=source herd=pagila_node1_set1
./bucardo_ctl add table public.store db=pagila_1 ping=true standard_conflict=source herd=pagila_node1_set1
./bucardo_ctl add table public.category db=pagila_1 ping=true standard_conflict=source herd=pagila_node1_set1
./bucardo_ctl add table public.film db=pagila_1 ping=true standard_conflict=source herd=pagila_node1_set1
./bucardo_ctl add table public.film_category db=pagila_1 ping=true standard_conflict=source herd=pagila_node1_set1

I'll run the script...

josh@eddie:~/devel/bucardo$ sh scripts/pagila-slony2bucardo.sh
Added database "pagila_1"   
Added database "pagila_4"   
Added database "pagila_3"   
Added database "pagila_2"   
Created herd "pagila_node1_set1"
Added table "public.customer"
Added table "public.language"
Added table "public.store"
--- snip ---
Added sequence "public.store_store_id_seq"
Added sequence "public.payment_payment_id_seq"
Added sequence "public.country_country_id_seq"
Added sync "pagila_set1_node1_to_node4"
Added sync "pagila_set1_node1_to_node2"
Created herd "pagila_node2_set1"
Added table "public.customer"
Added table "public.language"
Added table "public.store"
--- snip ---
Added sequence "public.store_store_id_seq"
Added sequence "public.payment_payment_id_seq"
Added sequence "public.country_country_id_seq"
Added sync "pagila_set1_node2_to_node3"

Now all that's left is to shut down Slony (I just use the "pkill slon" command on each database server), start Bucardo, and, eventually, remove the Slony schemas. Note that Bucardo runs only on one machine (which in this case isn't either of the database servers I'm using for this demonstration -- Bucardo can run effectively anywhere you want).

josh@eddie:~/devel/bucardo$ ./bucardo_ctl start
Checking for existing processes
Removing /home/josh/devel/bucardo/pid/fullstopbucardo
Starting Bucardo
josh@eddie:~/devel/bucardo$ tail -f log.bucardo 
[Mon Feb  1 21:45:27 2010]  KID Setting sequence public.actor_actor_id_seq to value of 202, is_called is 1
[Mon Feb  1 21:45:27 2010]  KID Setting sequence public.city_city_id_seq to value of 600, is_called is 1
[Mon Feb  1 21:45:27 2010]  KID Setting sequence public.store_store_id_seq to value of 2, is_called is 1
[Mon Feb  1 21:45:27 2010]  KID Setting sequence public.payment_payment_id_seq to value of 32098, is_called is 1
[Mon Feb  1 21:45:27 2010]  KID Setting sequence public.country_country_id_seq to value of 109, is_called is 1
[Mon Feb  1 21:45:27 2010]  KID Total delta count: 0
[Mon Feb  1 21:45:27 2010]  CTL Got notice "bucardo_syncdone_pagila_set1_node1_to_node2_pagila_2" from 22961
[Mon Feb  1 21:45:27 2010]  CTL Sent notice "bucardo_syncdone_pagila_set1_node1_to_node2"
[Mon Feb  1 21:45:27 2010]  CTL Got notice "bucardo_syncdone_pagila_set1_node1_to_node4_pagila_4" from 22962
[Mon Feb  1 21:45:27 2010]  CTL Sent notice "bucardo_syncdone_pagila_set1_node1_to_node4"

Based on those logs, it looks like everything's running fine, but just to make sure, I'll use bucardo_ctl's "list syncs" and "status" commands:

josh@eddie:~/devel/bucardo$ ./bucardo_ctl list syncs
Sync: pagila_set1_node1_to_node2  (pushdelta)  pagila_node1_set1 =>  pagila_2  (Active)
Sync: pagila_set1_node1_to_node4  (pushdelta)  pagila_node1_set1 =>  pagila_4  (Active)
Sync: pagila_set1_node2_to_node3  (pushdelta)  pagila_node2_set1 =>  pagila_3  (Active)

josh@eddie:~/devel/bucardo$ ./bucardo_ctl status
Days back: 3  User: bucardo  Database: bucardo  Host: /tmp  PID of Bucardo MCP: 22936
Name                       Type  State PID   Last_good Time  I/U/D Last_bad Time
==========================+=====+=====+=====+=========+=====+=====+========+====
pagila_set1_node1_to_node2| P   |idle |22952|52s      |0s   |0/0/0|unknown |    
pagila_set1_node1_to_node4| P   |idle |22953|52s      |0s   |0/0/0|unknown |    
pagila_set1_node2_to_node3| P   |idle |22954|52s      |0s   |0/0/0|unknown |    

Everything looks good. Before I test that data are really replicated correctly, I'll issue the a "DROP SCHEMA _pagila CASCADE" command in each database, which I can do while Bucardo's running. If this were a production system, the best strategy, to avoid things getting replicated twice) would be to stop all applications, stop Slony, start Bucardo, and start the applications, though because Slony and Bucardo both replicate rows using primary keys, doing otherwise wouldn't cause duplicated data.

Finally, I'll tail the Bucardo logs while inserting rows in the pagila1 database, to see what happens. These rows tell me it's working:

[Mon Feb  1 21:55:42 2010]  KID Setting sequence public.payment_payment_id_seq to value of 32098, is_called is 1
[Mon Feb  1 21:55:42 2010]  KID Setting sequence public.inventory_inventory_id_seq to value of 4581, is_called is 1
[Mon Feb  1 21:55:42 2010]  KID Setting sequence public.country_country_id_seq to value of 109, is_called is 1
[Mon Feb  1 21:55:42 2010]  KID Total delta count: 1
[Mon Feb  1 21:55:42 2010]  KID Deleting rows from public.actor
[Mon Feb  1 21:55:42 2010]  KID Begin COPY to public.actor
[Mon Feb  1 21:55:42 2010]  KID End COPY to public.actor
[Mon Feb  1 21:55:42 2010]  KID Pushdelta counts: deletes=0 inserts=1
[Mon Feb  1 21:55:42 2010]  KID Updating bucardo_track for public.actor on pagila_1
...
[Mon Feb  1 21:55:43 2010]  CTL Got notice "bucardo_syncdone_pagila_set1_node1_to_node4_pagila_4" from 22962
[Mon Feb  1 21:55:43 2010]  CTL Sent notice "bucardo_syncdone_pagila_set1_node1_to_node4"
[Mon Feb  1 21:55:43 2010]  CTL Got notice "bucardo_syncdone_pagila_set1_node1_to_node2_pagila_2" from 22961
[Mon Feb  1 21:55:43 2010]  CTL Sent notice "bucardo_syncdone_pagila_set1_node1_to_node2"

In this case I need to "kick" the node 2 -> node 3 sync to get it to replicate, but I could configure the sync with a timeout so that happened automatically. Once I do that, I get log messages for it as well.

[Mon Feb  1 22:00:34 2010]  CTL Got notice "bucardo_syncdone_pagila_set1_node2_to_node3_pagila_3" from 22963
[Mon Feb  1 22:00:34 2010]  CTL Sent notice "bucardo_syncdone_pagila_set1_node2_to_node3"

Please consider giving slony_migrator.pl a try. I'd be glad to hear how it works out.

by Joshua Tolley (noreply@blogger.com) at February 02, 2010 09:19 AM

February 01, 2010

Hubert Lubaczewski

Waiting for 9.0 – table and index sizes

On 19th of January Tom Lane committed really brilliant patch: Log Message: ----------- Add pg_table_size() and pg_indexes_size() to provide more user-friendly wrappers around the pg_relation_size() function.   Bernd Helmle, reviewed by Greg Smith Why is it brilliant? Because I’m lazy. And I think it’s a virtue, and not flaw. Let’s consider this example: create table x ( id serial primary key, [...]

by depesz at February 01, 2010 10:33 PM