The original GiST publication ([HNP95]) specifies an extension interface that is relatively simple and assumes a node organization and node traversal algorithms similar to those of R-tree nodes (which store entries in an unordered, heap-like fashion on a page; as a result, every entry on a page must be checked when traversing the page or when looking for matches to a query). Libgist implements this simple interface and another, more low-level extension interface. The latter allows a derived extension class to implement a node layout that is tuned for the particular AM at hand. An example of this is again the B-tree extension, which implements ordered node entries in order to do binary search when checking for matching entries.
The following is an overview of the classes and their roles within libgist.
The classes relevant for extension designers are gist_ext_t and
gist_unordered_t.
gist_m | Class for index objects, which also contains all the manipulation and query functions for indices. |
gist_ext_t | Low-level extension class, which specifies a node-action-oriented extension interface in order to allow the AM designer to tune the node layout. |
gist_unorderedn_t | Subclass of gist_ext_t, which implements an R-tree-style unordered node layout.In order to create an object of this class, an object of class gist_unordered_t must be supplied as a parameter. |
bt_ext_t | Subclass of gist_ext_t, which implements an ordered node layout. (Since this is tantamount to B-trees, we decided not to call this class gist_orderedn_t.) |
gist_unordered_t | Base class for the simple extension interface as described in the original GiST publication. |
rt_ext_t | Subclass of gist_unordered_t, which implements the traversal and node split behavior of R-trees. |
class gist_ext_t { ... }; class gist_unorderedn_t : gist_ext_t { ... gist_unorderedn_t::gist_unorderedn_t(const gist_unordered_t& fcts); }; class gist_unordered_t { ... }; class rt_ext_t : public gist_unordered_t { ... }; gist_unorderedn_t bt_int_ext(rt_ext_t(...));The R-tree example shows that:
The extension implementor can design an extension class that utilizes the node layout extension interface in two ways:
The following are the interface functions that subclasses of gist_ext_t must implement. The parameters of class vec_t are vectors of data pointer/length (described in more detail in the next section as well).
Libgist makes the page structure and management functions of its index nodes available through the class gist_p. This class implements a simple slotted page organization (defined by page_s) - with a page header at the beginning and a slot directory at the end of the page - and specifies the format of the page entries. The entries have headers (structure keyrec_t), which contain the size in bytes of the predicate and the data pointer, as well as a child pointer (only useful for internal nodes). The entries are required to store the predicate and data pointer portions contiguously (but either one can have length 0). Also, the length info stored in the header must reflect the actual length of the predicate stored on the page, even if it is compressed and the expanded lengths deviate from the stored length. Class gist_p stores a header record and a BP on every page in the first and second available slot (only root pages do not contain a BP). Nevertheless, the data entries added to the page by the gist_p user still start at slot index 0, because the public member functions of gist_p automatically compensate for the header and the BP. Subclasses of gist_ext_t must use gist_p to implement the desired node layout and the entries of a node must have the keyrec_t header. Furthermore, every item on the page that has its own data or child pointer must have its own slot and header associated with it, so that both pointers are accessible via gist_p::rec(slotindex).elem() and gist_p::rec(slotindex).child(). (Unfortunately, this rules out storing duplicates as (pred, data ptr1, data ptr2, ...) lists; this might be fixed in a future version of libgist.) The constant gist_p::max_tup_sz specifies the maximum size that a predicate in a data entry is allowed to have.
Class vec_t is used frequently in the interfaces
of gist_ext_t and gist_p to pass data pointer/length
pairs or vectors of these pairs as a single parameter. The interfaces of
vec_t and cvec_t (for vectors of
constant pointer/length pairs) contain functions to construct
vectors (in various flavors), reset and inspect
the vectors (ptr and len).
class gist_cursor_t; class gist_lstk; class gist_ustk; class gist_p; class gist_m { public: gist_m(); ~gist_m(); // create an empty Gist rc_t create( const char* filename, gist_ext_t* ext); // open an existing Gist rc_t open( const char* filename, gist_ext_t* ext); rc_t close(); rc_t flush(); // insert a single key rc_t insert( const void* key, const int keyLen, const void* dataPtr, const int dataPtrLen); // remove all keys matching the query rc_t remove( const void* query); // initialize the cursor rc_t fetch_init( gist_cursor_t& cursor, const void* query); // fetch the next data item rc_t fetch( gist_cursor_t& cursor, void* key, smsize_t& keyLen, void* dataPtr, smsize_t& dataPtrLen, bool& eof); // returns true if tree contains no data bool is_empty(); // checks structural and data integrity of the tree; // outputs violations to stderr rc_t check(); // dumps the content of a page to stdout; // if 0 is specified as the pgno, dumps out the tree rc_t dump( shpid_t pgno = 0); };
class gist_predcursor_t; class gist_penalty_t { public: const double max_penalty = MAXDOUBLE; gist_penalty_t() {} gist_penalty_t(double p) : p(p) {} double p; // a double must be enough bool operator<(const gist_penalty_t &pen) { return p < pen.p; } gist_penalty_t& operator=(const gist_penalty_t &pen) { p = pen.p; return *this; } }; class gist_ext_t { public: // here are the node layout-specific routines // Create a new entry on a page. // Return eRECWONTFIT if the new entry does not fit on the page. virtual rc_t insert( gist_p& page, const vec_t& key, const vec_t& data, shpid_t child) = 0; // Remove a number of entries, identified by their slot indices. virtual rc_t remove( gist_p& page, int slots[], int numSlots) = 0; // For internal nodes only: update the key for the entry in the given slot. virtual rc_t updateKey( gist_p& page, int& slot, // in/out: old/new slot assignment const vec_t& newKey) = 0; // Find the entry with the lowest insertion penalty for the given key; // return the slot index in 'slot'. virtual void findMinPen( const gist_p& page, const vec_t& key, const vec_t& data, int& slot) = 0; // Search the page for entries matching the given query; // return their slot indices in 'matches'. virtual void search( gist_p& page, const void* query, int matches[], int& numMatches) = 0; // Return the key of the entry in the given slot. If a conversion/decompression // of the key takes place, the key is written into the memory pointed to by // key.ptr(0). // If no conversion takes place, the key points to the buffered page. virtual void getKey( gist_p& page, int slot, vec_t& key) = 0; // Determine how to split the given node. The split info is the slot indices of // those entries that go to the right sibling. Make sure that rightEntries only // contains the indices of existing slots. This procedure also calculates the // new BPs (which have to be copied into left-/rightBp). // The new entry/-ies that will be inserted after the split are given in // entry1 and -2 (if they are leaf entries, the second component will hold the data // pointer); those need to be taken into consideration when calculating the BPs. // (if there's no second entry to be inserted, entry2.size() == 0). The caller also // needs to know which page to insert each entry on; this information is returned // in *GoesRight. virtual rc_t pickSplit( gist_p& page, int rightEntries[], int& numRight, const vec_t& oldBp, vec_t& leftBp, vec_t& rightBp, const vec_t& entry1, bool& oneGoesRight, const vec_t& entry2, bool& twoGoesRight) = 0; // Compute the BP of the page after item(s) were deleted, given the prior // BP. virtual void unionPage( gist_p& page, vec_t& bp, bool& bpChanged) // set to true if BP changed = 0; // here are the datatype-specific routines // union the newly inserted pred to the BP; // the storage allocated for the bp is max_tup_sz; // indicate if BP changed; virtual void unionKey( vec_t& bp, int level, // level of page where BP is on const vec_t& pred, bool &bpChanged // out: true if bp changed ) = 0; // Return true if the predicate is contained by the BP. // Used during gist_t::check() to verify the integrity of // the node's content. virtual bool check( const vec_t& bp, const vec_t& pred, int level) // level of page = 0; virtual void printPred( const vec_t& pred, int level) // level of page where pred is stored = 0; virtual void printData( const vec_t& data) = 0; };
class bt_ext_t : public gist_ext_t { public: // generic comparison function typedef int (*CmpFct)(const void *a, const void *b); CmpFct keyCmp; // for keys CmpFct dataCmp; // for data pointers // returns the size in bytes of a key stored on a disk page; // the goal is to get to the data pointer portion of an internal entry, // given the pointer to the key portion typedef int (*SizeFct)(const void *e); SizeFct size; // for printPred() typedef void (*PrintFct)(const void *pred); PrintFct prPred, prData; // for pickSplit() (creating new BPs) typedef void (*NegInftyFct)(void *x); NegInftyFct negInftyKey, negInftyData; bt_ext_t(CmpFct keyCmp, CmpFct dataCmp, SizeFct size, PrintFct prPred, PrintFct prData, NegInftyFct negInftyKey, NegInftyFct negInftyData) : keyCmp(keyCmp), dataCmp(dataCmp), size(size), prPred(prPred), prData(prData), negInftyKey(negInftyKey), negInftyData(negInftyData) {}; rc_t insert( gist_p& page, const vec_t& key, const vec_t& data, shpid_t child); rc_t remove( gist_p& page, int slots[], int numSlots); rc_t updateKey( gist_p& page, int& slot, const vec_t& newKey); void findMinPen( const gist_p& page, const vec_t& key, const vec_t& data, int& slot); void search( gist_p& page, const void* query, int matches[], int& numMatches); void getKey( gist_p& page, int slot, vec_t& key); rc_t pickSplit( gist_p& page, int rightEntries[], int& numRight, const vec_t& oldBp, vec_t& leftBp, vec_t& rightBp, const vec_t& entry1, bool& oneGoesRight, const vec_t& entry2, bool& twoGoesRight); void unionPage( gist_p& page, vec_t& bp, bool& bpChanged); void unionKey( vec_t& bp, int level, const vec_t& pred, bool &bpChanged); bool check( const vec_t& bp, const vec_t& pred, int level); void printPred( const vec_t& pred, int level); void printData( const vec_t& data); };
class gist_unorderedn_ext_t : public gist_ext_t { public: gist_unordered_ext_t& ext; rc_t insert( gist_p& page, const vec_t& key, const vec_t& data, shpid_t child); rc_t remove( gist_p& page, int slots[], int numSlots); rc_t updateKey( gist_p& page, int& slot, const vec_t& newKey); void findMinPen( const gist_p& page, const vec_t& newKey, const vec_t& data, int& slot); void search( gist_p& page, const void* query, int matches[], int& numMatches); void getKey( gist_p& page, int slot, vec_t& key); rc_t pickSplit( gist_p& page, int rightEntries[], int& numRight, const vec_t& oldBp, vec_t& leftBp, vec_t& rightBp, const vec_t& entry1, bool& oneGoesRight, const vec_t& entry2, bool& twoGoesRight); void unionPage( gist_p& page, vec_t& bp, bool& bpChanged); void unionKey( vec_t& bp, int level, const vec_t& pred, bool &bpChanged); bool check( const vec_t& bp, const vec_t& pred, int level); void printPred( const vec_t& pred, int level); void printData( const vec_t& data); gist_unorderedn_ext_t(gist_unordered_ext_t& ext); };
class gist_predcursor_t { public: struct entry { const void *key; int keyLen; }; // + 2: for new entries const int maxElems = gist_p::max_scnt + 2; int numElems; entry elems[maxElems]; gist_predcursor_t(); ~gist_predcursor_t(); // make cursor return predicate void add(const void* data, int len); // make cursor return keys on page (except for BP) void add(gist_p& page); // start from the beginning void reset(); }; class gist_unordered_ext_t { public: // consistent: evaluates pred w.r.t. query virtual bool consistent( const void* query, const void* pred, int predLen, int level) = 0; // at what level was that predicate found (leaf = 1) // returns insertion penalty of new key into subtree virtual void penalty( const void* subtreePred, int predLen, int level, // at what level was that predicate found const void* newKey, int keyLen, gist_penalty_t& p) = 0; // union the newly inserted pred to the BP; // the length of the old bp is // indicate if BP changed; virtual void union_key( void* bp, // in: old BP, out: updated BP int& len, // in: length of old BP; out: length of changed BP int level, // level of page where BP is on const void* newPred, // leaf-type predicate int newLen, // length of predicate bool& changed // out: true if bp changed ) = 0; // union all the keys on a single page // done after an item was deleted; // indicate if BP changed; virtual void union_page( void* bp, int &len, int level, // level of BP gist_predcursor_t& pcursor, bool& changed) = 0; // Determine which entries go on the new sibling after a split // (the indices of the entries that should be moved right are stored in rightEntries, // their number in numRight; the indices are returned by gist_predcursor_t::fetch() and do // not have to be stored in the order delivered by fetch() in rightEntries); // compute the new BPs for both nodes // (can't do it with union, because we might need to know the BP of the // other sibling). // If the old BP is NULL, we're splitting the root; the bounds should // be assumed to be infinite in this case. virtual void pickSplit( gist_predcursor_t& pcursor, // used to retrieve entries for page int level, // level of page (1 = leaf) int rightEntries[], // out: store indices of entries to go on right sibling int& numRightEntries, // out: store number of entries to go right const void* oldBp, // pre-split BP of original page int oldLen, // length of pre-split BP void* leftBp, // out: new BP of original page int& leftLen, // out: length of new left BP void* rightBp, // out: new BP of new right sibling int& rightLen) // out: length of new right BP = 0; // Return true if the predicate is contained by the BP. // Used during gist_t::check() to verify the integrity of // the node's content. virtual bool check( const void* bp, // BP on page int bplen, // length of BP const void* pred, // single entry's predicate int predlen, // length of predicate int level) // level of page = 0; virtual void printPred( const void* pred, int plen, int level) // level of page where pred is stored = 0; virtual void printData( const void* data, int dlen) = 0; };
class rt_ext_t : public gist_unordered_ext_t { public: const int numLvl = 2; // only 2 levels to distinguish: leaf/non-leaf typedef bool (*CmpFct)(const void *, int, const void *); // any point/rect comparison function // function table: one for each operator, possible argument type // and leaf/internal level typedef CmpFct CmpFctTbl[numLvl][rt_query_t::rt_numarg][rt_query_t::rt_numoper]; CmpFctTbl cmpFcts; // for consistent() // for penalty()/pickSplit()/union_key(): // expand rect with rect/pt typedef void (*ExpandFct)(rt_rect &r, const void *item, int len); ExpandFct expand; // for pickSplit() typedef double (*DistanceFct)(const void *item1, const void *item2, int len); DistanceFct dist; // compute the distance between two rects/pts typedef int (*DimFct)(int dataSz); // returns dimension of points/rects DimFct dim; typedef void (*RectFct)(rt_rect &r, const void *item, int len); RectFct copyRect; // creates rect from pt/rect rt_ext_t(CmpFctTbl tbl, ExpandFct exp, DistanceFct distance, DimFct dim, RectFct rf); bool consistent( const void *query, const void *pred, int predLen, int level); void penalty( const void *subtreePred, int predLen, int level, const void *newKey, int keyLen, gist_penalty_t &p); void union_key( void *bp, int &len, int level, const void *newPred, int newLen, bool &changed); void union_page( void *bp, int &len, int level, gist_predcursor_t &pcursor, bool &changed); void pickSplit( gist_predcursor_t &pcursor, int level, int rightEntries[] /*out*/, int &numRightEntries /*out*/, const void *oldBp, int oldLen, void *leftBp /*out*/, int &leftLen /*in/out*/, void *rightBp /*out*/, int &rightLen /*int/out*/); bool check( const void *bp, int bplen, const void *pred, int predlen, int level); void printPred( const void *pred, int plen, int level); void printData( const void *data, int dlen); };
/* * Basic page structure for all pages. */ struct page_s { struct slot_t { int2 offset; // -1 if vacant uint2 length; }; class space_t { public: space_t() {}; ~space_t() {}; void init(int); int nfree() const; int usable(); // slot_bytes means bytes for new slots rc_t acquire(int amt, int slot_bytes); void release(int amt); private: void _check_reserve(); int2 _nfree; // free space counter }; enum { data_sz = (SM_PAGESIZE - sizeof(lpid_t) - sizeof(space_t) - 3 * sizeof(int2) - 2 * sizeof(slot_t) - 2 * sizeof(int2)), max_slot = data_sz / sizeof(slot_t) + 2 }; lpid_t pid; // id of the page space_t space; // space management uint2 end; // offset to end of data area int2 nslots; // number of slots int2 nvacant; // number of vacant slots char data[data_sz]; // must be aligned slot_t reserved_slot[1]; // 2nd slot (declared to align // end of _data) slot_t slot[1]; // 1st slot }; class keyrec_t { public: struct hdr_s { uint2 klen; uint2 elen; shpid_t child; }; const char* key() const; const char* elem() const; const char* sep() const; smsize_t klen() const; smsize_t elen() const; smsize_t slen() const; smsize_t rlen() const; shpid_t child() const; private: hdr_s _hdr; char* _body() const; }; struct gistctrl_t { lpid_t root; int2 level; // leaf if 1, non-leaf if > 1 gistctrl_t(); }; class gist_p { public: typedef page_s::slot_t slot_t; enum { data_sz = page_s::data_sz, max_slot = data_sz / sizeof(slot_t) + 2 }; enum { max_tup_sz = data_sz / 3 - sizeof(slot_t) - sizeof(keyrec_t), // 1 BP and 2 entries minimum max_scnt = (data_sz - sizeof(gistctrl_t)) / (sizeof(keyrec_t) + sizeof(slot_t)) + 1 // max # of slots on a page }; gist_p(); ~gist_p(); rc_t format( const lpid_t pid, const gistctrl_t *hdr); rc_t insert( const keyrec_t &tup); rc_t copy(gist_p& rsib); rc_t set_hdr(const gistctrl_t& new_hdr); int level() const; void root(lpid_t& r) const; shpid_t root() const; bool is_leaf() const; bool is_node() const; bool is_root() const; // Gist-specific record access: // masks the BP, so that the first entry on the page has idx = 0; // the BP can be accessed with idx = -1 const int bpSlot = -1; const keyrec_t& rec(int idx) const; int rec_size(int idx) const; int nrecs() const; // the slot index is automatically corrected if a BP is present on the page rc_t insert( const cvec_t& key, const cvec_t& el, int slot, shpid_t child = 0); rc_t remove(int slot); public: bool is_fixed() const; gist_p& operator=(const gist_p& p); smsize_t usable_space(); const lpid_t& pid() const; private: rc_t insert_expand( int idx, int cnt, const cvec_t tp[]); rc_t remove_compress(int idx, int cnt); rc_t overwrite( int idx, int start, const cvec_t& data); // state page_s* _pp; gist_file::page_descr* descr; friend class gist_m; // what I really meant was: //friend rc_t gist_m::_fix_page(gist_p &page, shpid_t pageNo); //friend rc_t gist_m::_unfix_page(gist_p &page); void _compress(int idx = -1); // formerly from page_p inline smsize_t used_space(); smsize_t contig_space(); smsize_t tuple_size(int idx) const; void* tuple_addr(int idx) const; inline bool is_tuple_valid(int idx) const; const void* get_hdr() const; };
#define MAX_SMALL_VEC_SIZE 8 #define CADDR_T const unsigned char * struct vec_pair_t { CADDR_T ptr; size_t len; }; struct VEC_t { int _cnt; size_t _size; vec_pair_t* _base; // pointer to beginning of _pair or malloced // space vec_pair_t _pair[MAX_SMALL_VEC_SIZE]; }; class cvec_t : protected VEC_t { friend class vec_t; // so vec_t can look at VEC_t public: enum dummy_enumid { max_small = MAX_SMALL_VEC_SIZE }; cvec_t() { _cnt = 0; _size = 0; _base = &_pair[0]; } cvec_t(const cvec_t& v1, const cvec_t& v2) { _base= &_pair[0]; set(v1, v2); } cvec_t(const void* p, size_t l) { _base = &_pair[0]; set(p, l); } ~cvec_t(); bool _is_large() const {return _base != &_pair[0];} cvec_t& put(const void* p, size_t l); cvec_t& put(const cvec_t& v); cvec_t& reset() { _cnt = _size = 0; return *this; } cvec_t& set(const cvec_t& v1, const cvec_t& v2) { return reset().put(v1).put(v2); } cvec_t& set(const void* p, size_t l) { return reset().put(p, l); } size_t size() const { return _size; } size_t copy_to(void* p, size_t limit = 0x7fffffff) const; }; class vec_t : public cvec_t { public: vec_t() : cvec_t() {}; vec_t(const cvec_t& v1, const cvec_t& v2) : cvec_t(v1, v2) {}; vec_t(const void* p, size_t l) : cvec_t(p, l) {}; //vec_t(const vec_t& v, size_t offset, size_t limit) //: cvec_t(v, offset, limit) {}; void* ptr(int index) const { return (index >= 0 && index < _cnt) ? (void*)_base[index].ptr : NULL; } size_t len(int index) const { return (index >= 0 && index < _cnt) ? _base[index].len : 0; } };
Comments, questions and suggestions may be directed to gist@postgres.berkeley.edu