pg51g


Summary

pg51g is a data diff toolkit for PostgreSQL. It is meant to be pronounced 'pee-gee-sig' -- which is what you get if you unfocus enough while looking at the name :-) I know, it's a terrible name.

It is also an accurate name, as it generates and maintains md5-based signature tables for your relations in a PostgreSQL database. These sig tables may be used in two ways:

This software has been based on Fabien Coelho's report on Remote Comparison of Database Tables. The full text may be found here. This software, however, represents a brand new implementation in C and PL/pgSQL, with modifications.

Availability

The pg51g code has been released with a BSD license, which is what applies to the PostgreSQL distribution. The most recent version may always be checked out from the author's github account: http://github.com/mnacos

I will also try to update this link with the most recent "stable" version, from time to time. Please remember, I am making no guarantees of pg51g's fitness for any task. It will always be "beta" software.

You might also be interested in my rants.

Documentation

The only available documentation right now is this html page and the pg51g.nw file included in the distribution tarball.

Installation instructions

The only available option for pg51g installation is building from source.

Prerequisites:

Steps:
  1. Extract the source code and cd to the source directory
    tar xvzf pg51g-0.92-rc1.tar.gz; cd pg51g-0.92-rc1
  2. Build the library (you might need to edit Makefile if pg_config not in PATH)
    make pg51g.so
  3. Copy the shared library and the sql file in the right postgresql directories
    cp pg51g.so /usr/local/pgsql/lib; cp pg51g.sql /usr/local/pgsql/shared/contrib
  4. (optional) Run the pg51g tests
    ./test-pg51g.pl
  5. Use pg51g.sql to import the pg51g functions in a database of choice
    cat /usr/local/pgsql/shared/contrib/pg51g.sql | psql -U postgres targetdb
  6. Put some tables under pg51g control

Adding tables or views to the pg51g system

To include all non-system tables you would simply use the psql shell to type:

SELECT * FROM pg51g_bulk_add(); SELECT * FROM pg51g_bulk_snap();
This step may take from seconds to hours, depending on the size of your database and the speed of your machine.

Alternatively, you could add individual tables or views (e.g. myschema.mytable) by typing:

SELECT pg51g.add('myschema.mytable'); SELECT pg51g.snap('myschema.mytable');

Note

Tables with no primary keys cannot be added to the system, unless a pseudo primary key is defined with the key() function. This also applies to all VIEWs, since there is no such thing as a primary key for a view. e.g.

SELECT pg51g.key('some.view', 'field1||''-''||field2');

From here, you are on your own: you need to make sure the pseudo key values are unique in the target relation, or the diff results will be wrong.

To exclude a single table after adding all tables to pg51g, you would follow the pg51g_bulk_add() operation with a pg51g.rm() one.

When do I need to use pg51g.do(...)?

By adding a table to pg51g control, a trigger is attached to this table which keeps the resulting signature table up-to-date. You only need to use pg51g.do() to update the signature table if your dealing with VIEWs, or you have disabled the relevant trigger e.g. via pg51g.unmonitor_table(). When using do(), the entire signature table is regenerated.

Monitoring schema changes

You could easily place a database schema under revision control by using pg51g's support for VIEWs. For example, to monitor for changes in attribute types, nullability, ordinal position etc. you would:

  1. Create a new schema, called schema, to store all schema-related signature tables
    CREATE SCHEMA schema;
  2. Create an appropriate view of information_schema.columns, excluding system and helper namespaces
    CREATE VIEW schema.columns AS SELECT * FROM information_schema.columns
       WHERE table_schema NOT IN ('pg_catalog','information_schema', 'pg51g', 'schema');
  3. Select an appropriate pseudo-primary key (views do not have primary keys in postgresql)
    SELECT pg51g.key('schema.columns','table_schema||''.''||table_name||''.''||column_name');
  4. Add schema.columns to pg51g control and take a snapshot of the signatures
    SELECT pg51g.add('schema.columns'); SELECT pg51g.snap('schema.columns');
  5. You may now check the current state against this snapshot by runnning
    SELECT pg51g.do('schema.columns'); SELECT pg51g.diff('schema.columns');
    or
    SELECT pg51g.do('schema.columns'); SELECT (pg51g.diff('schema.columns')).*;
That's it... now, any changes you make to an attribute in any table in your database are detectable

Example

if you run...

ALTER TABLE somedata.sometable ALTER COLUMN myvarchar TYPE TEXT;
SELECT pg51g.do('schema.columns); SELECT (pg51g.diff('schema.columns')).*;
you will get:
             key              |   op
------------------------------+--------
 somedata.sometable.myvarchar | UPDATE
(1 row)

You probably need to know more than changes to attribute types. You might, for example, want to keep track of table constraints, which include primary keys. All you need to do is create another view (schema.table_constraints) referencing information_schema.table_constraints and add it to pg51g control. And so on...

Comparing signature tables across databases

To compare signature tables across different databases you need to use an external tool, such as JComparator, which is provided with the pg51g distribution. JComparator works, but it is proof-of-concept code and it simply prints the differences it has found to standard output. You can easily, however, substitute all System.out.println() calls for level 0 with push operations to either memory structures or message buses. Or, one of these days, I will turn JComparator into a proper class which can be easily extended.

For now, you may easily run it from the command line with a convenience ant script like so:

ant -Dtbl1 pg51g.schema_columns -Dtbl2 pg51g.saved_schema_columns \
    -Dusr1 postgres -Dusr2 postgres run

The above example simulates the diff of the previous section. Note, we are referencing the pg51g signature tables directly, not the target table the signatures are about (pg51g.schema_columns instead of schema.columns). This is the output you would expect:

run:
     [echo] Starting the JComparator process...
     [java]
----------------------------------------------------------------------------------
     [java] Connection 1:          jdbc:postgresql://localhost:5432/pg51gTests
     [java] Table 1:               pg51g.schema_columns
     [java]
----------------------------------------------------------------------------------
     [java] Connection 2:          jdbc:postgresql://localhost:5432/pg51gTests
     [java] Table 2:               pg51g.saved_schema_columns
     [java]
----------------------------------------------------------------------------------
     [java] Database connection 1: OK
     [java] Database connection 2: OK
     [java]
----------------------------------------------------------------------------------
     [java] --> Level: 0
     [java]
----------------------------------------------------------------------------------
     [java] UPDATE:  Pri: somedata.sometable.myvarchar Key: ca52e0a1a0e301d209e...
     [java]
----------------------------------------------------------------------------------
     [java] Number of differences: 1

Anything not specified in the command line will be read from the build.xml file, in the above example the default connection string is jdbc:postgresql://localhost:5432/pg51gTests for both databases. If you wanted to connect to another server or database, you would probably write:

    ant -Dstr1 jdbc:postgresql://server1:5432/targetdb \
        -Dstr2 jdbc:postgresql://server2:5432/targetdb ...

and compare signature tables in the targetdb databases of servers server1 and server2. You could also create custom build.xml scripts for the most frequent comparisons and use these with ant (with the -f switch), instead of including passwords in the command line!

To use JComparator, you obviously need Java and ant installed on your system, as well as the PostgreSQL JDBC driver (available here), placed in the lib/ directory. Running ant with no arguments will give you a list of options.

Connection parameters

The full set of connection parameters for the JComparator ant script is: