A Guided Tour of Sample Code


The PostgreSQL GiST access method provides the basic GiST functionality inside PostgreSQL. Like all the Postgres access methods, it does nothing by itself -- one must define opclasses to make it work. With a little coding on your part, you can use the GiST access method to index data in a variety of ways, and support just about any query. It is much simpler to use than writing your own access method (which is a lot of work!)

We have provided some example opclasses to get you started.  The boxproc.* files contain code for the gist_box_ops opclass, which uses the GiST access method just like an R-tree with box_ops.  The polyproc.* files contain code for the gist_poly_ops opclass, which uses the GiST just like an R-tree with poly_ops (though the GiST is more efficient than the R-tree in this case, since it uses key compression to get better fanout!) The intproc.* and textproc.* files provide B-tree-like behavior for int and text attributes respectively. Though these are not as efficient as the native btree access method, they illustrate the flexibility of the GiST access method, and may be useful for cannibalizing when inventing new indexing schemes.

As an introduction to developing new GiST opclasses, we will step through some of these examples. We will go through the simplest project, gist_box_ops, in some detail, and then just describe some of the features of the other opclasses that are different from gist_box_ops.  Probably the easiest way to write new GiST opclasses is to cannibalize some of this sample code, so it's worth looking at carefully.


gist_box_ops

The GiST access method requires you to register 7 support functions, which encapsulate the behavior of the index. We will go through each of these in detail for gist_box_ops (in order):

  1. bool gbox_consistent(GISTENTRY *entry, BOX *query, StrategyNumber strategy): this routine takes an index entry (see src/backend/access/gist.h for the structure of a GISTENTRY), a value of the same type as the indexed attribute (in this case a BOX *), and a "strategy number" which corresponds to a procedure in the pg_amop table (the one that has amopclaid = <the oid of this opclass>, and amopstrategy = strategy). The routine should return FALSE if there can be no data below this entry that satisfies the predicate x op query (where op is the oper corresponding to strategy) and TRUE otherwise.
  2. BOX *gbox_union(bytea *entryvec, int *sizep): takes an array of GISTENTRY pointers, and a pointer to an integer. It returns an index key which is true of all the data below the entries. The integer pointer is set to the size of the index key that is returned. In this case, it returns the minimum bounding box which holds all the boxes in entryvec.
  3. GISTENTRY *gbox_compress(GISTENTRY *entry): takes an index entry with a standard key and returns an entry with a compressed version of the key. In the case of boxes, no compression is done, and the output is returned directly.
  4. GISTENTRY *gbox_decompress(GISTENTRY *entry): takes an index entry with a compressed key and returns an entry with a decompressed version of the key.  Again, this does nothing for boxes.
  5. GISTENTRY *gbox_penalty(GISTENTRY *origentry, GISTENTRY *newentry, float *result): takes an existing index entry, a new entry to be inserted, and a pointer to a float, and sets the float to a number representing how bad it would be if the new entry were inserted under the index entry. In our case this is the increase in the size of origentry that would be needed to accomodate newentry.
  6. GISTENTRY *picksplit(bytea *entryvec, GIST_SPLITVEC *v): takes an array of entries, and a pointer to a GIST_SPLITVEC (see src/backend/access/gist.h), and assignes the entries to the two groups of the split vector. The two groups will end up on different index nodes. In our case we use Guttman's poly-time R-tree split algorithm, lifted from the postgres R-tree code. (Note that this split algorithm is not specific to boxes -- it will work for any data type for which you define a size metric. It may not result in the best split, however.)
  7. bool *gbox_same(BOX *b1, BOX *b2, bool *result): takes two index keys and a pointer to a bool, and sets the bool to true iff the two keys are equal.

Naturally if you write an opclass with keys of type FOO, all references above to type BOX should be changed to FOO. If you invent a new key type (as is done in the int ops and text ops described below) you will need to define the type to Postgres. The names of the functions above are insignificant; you can use whatever names you want, as long as you follow the instructions below on inserting into the pg_amproc catalog.

Now, once you have written these functions, and successfully compiled and linked them into a shared library suitable for dynamic loading, you have to inform Postgres of your new opclass. This is a 4-step process (see boxproc.src for examples):

  1. First, you have to issue create function commands for each of the functions above.
  2. Next, you add the name of your new opclass to the pg_opclass table:
    1. insert into pg_opclass values ('gist_box_ops');

  3. After that, you need to register any oper that your Consistent method can support as a "strategy" for the gist index, by inserting into the pg_amproc table.  The amstrategy field will need to correspond with what you expect in your Consistent function.  In our examples, we load a temporary table with the oids and oprnames of the opers of choice, and then insert them into pg_amproc. This procedure is explained in more detail in the PostgreSQL manual section on adding new indexing schemes.
  4. Finally, you need to register the support methods you wrote above in pg_amproc. Note that support routines must be numbered (via pg_amproc.amprocnum) according to the numbers in the list above.

That's is! Now you can just go ahead and issue a create index command to psql:


gist_int_ops

The int ops make the gist support B-tree-style lookups on int attributes. The code for the int ops is pretty similar to the box ops, with one significant difference: the key type in the int ops differs from the type of the attribute being indexed.  That is, we will use this opclass to index attributes of type int, but the keys in the tree are of a new type called intrange. In order to make this happen, we need to do two things:

  1. In our compress method (ir_compress), if the entry to be compressed is on a leaf node, we must convert it from attribute type (int) to key type (intrange).
  2. In our create index command, we must tell Postgres about the type of our keys (the default assumption is that they're the same as the attribute being indexed):
    1. create index intix on inttmp using gist (i:intrange gist_irange_ops);

    Note the :intrange after the name of the attribute i.

The other PostgreSQL access methods (btree, rtree, and hash) only work with keys that are of the same type as the attribute being indexed; do not attempt to specify a different key type for these indices!


gist_poly_ops

This opclass is identical to the box ops, except that the compress method converts polygons to their bounding boxes, so the key type is box even though the attribute type is polygon. This presents one minor problem -- compression of a polygon to a box is lossy, i.e. it loses some of the information in the original. The result of this is that the index may return too many tuples -- those which satisfy the search key, and possibly some more which do not. To ensure that the query executor later filters out these non-satisfying tuples, we must inform postgres of the lossy compression.  This is done with the optional with clause of create index:

A note on lossy compression: In indices, the compression can be lossy only in a particular way -- it can make keys more general than they previously were (e.g. convert a polygon to its bounding box). They can not make the keys more specific (e.g. convert a polygon to the largest box that can be inscribe within it), or the index will not retrieve all the tuples that satisfy a query predicate.

Note also that the other PostgreSQL access methods do not support key compression, and hence will ignore the with (islossy) clause.


gist_text_ops

The text ops are really just the same as the int ops, except the keys are ranges of texts, rather than ints. The one problem this presents is in memory management -- since text is passed by reference, we have to be careful to pfree any texts appropriately.


Writing your own opclass

If you've made it this far, you should be ready to develop your own opclasses for gists.  Some points to consider are whether your key type is the same as your attribute type, whether your index is lossy, and whether your key type is passed by value or by reference.  Based on these considerations, you should probably start your work by taking one of the opclasses above and modifying it to suit your needs.