DB C8 Indexing

Basic Concepts

Indexing mechanisms used to speed up access to desired data. e.g author catalog in library.

Search Key: attribute to set of attributes used to look up records in a file.

index file: consists of records (called index entries) of the form <search-key><pointer>.

  • Index files are typically much smaller than the original file.
  • Two basic kinds of indices:
    • Ordered indices: search keys are stored in sorted order
    • Hash indices: search keys are distributed uniformly across “buckets” using a hash function.

Index Evaluation

  • Access types supported efficiently. e.g
    • Point query: records with a specified value in the attribute.
    • Range query: or records with an attribute value falling in a specified range of values.
  • Access time
  • Insertion time
  • Deletion time
  • Space overhead

Time and Space efficiency is the most significant indicators in DBS.

Ordered Indices

In an ordered index, index entries are stored sorted on the search key value. e.g, author catalog in library.

Primary and secondary

  • Sequentially ordered file: The records in the file (data file) are ordered by a search-key.
  • Primary index,主索引 / clustering index,聚集索引: in a sequentially ordered file, the index whose search key specifies the sequential order of the file.
  • Index-sequential file, 索引顺序文件 : ordered sequential file with a primary index.

The search key of a primary index is usually but not necessarily the primary key.

  • Secondary index, 辅助索引 / non-clustering index,非聚集索引: an index whose search key specifies an order different from the sequential order of the file.

    qFrequently, one wants to find all the records whose values in a certain field which is not the search-key of the primary index satisfy some condition.(实际应用中常有多种属性作为查询条件)

    ØExample : In the account database stored sequentially by account number, we may want to find all accounts with a specified balance or range of balances. (see next slide)

    qWe can have a secondary index with an index record for each search-key value; index record points to a bucket that contains pointers to all the actual records with that one particular search-key value.

    辅助索引不能使用稀疏索引,每条记录都必须有指针指向.

    但search-key常存在重复项—如700,而index entry不能有重复,否则查找算法复杂化, 为此,使用bucket结构。**

Dense and sparse

  • Dense index, 稠密索引: Index entry appears for every search-key value in the file.

    For tuples with search-key value, not all tuples will be indexed.

  • Sparse index, 稀疏索引: contains index entries for only some search-key values. ( Usually, one block of data gives an index entry, a block contains a number of ordered data records)

    Sparse index is applicable only when data file records are sequentially ordered on search-key!

    To locate a record with search-key value K :

    1. Find index record with largest search-key value < K
    2. Search file sequentially starting at the record to which the index entry points.

Comparison of dense and sparse

  1. Sparse consumes less space and less maintenance overhead for insertions and deletions.
  2. dense is generally quicker for locating records.
  3. dense is applicable for non-sequentially files.

Good tradeoff

Sparse index with an index entry for every block in file, corresponding to least search-key value in the block.

Multilevel Index

If primary index is too big to fit in memory, access becomes expensive.

qTo reduce number of disk accesses to index records, treat primary index kept on disk as a sequential file and construct a sparse index on it. (把内层索引文件看作顺序数据文件一样,在其上建立外层的稀疏索引)

ØOuter index – a sparse index of primary index

ØInner index – the primary index file

qIf even outer index is too large to fit in main memory, yet another level of index can be created, and so on.(可以推广到任意多层索引)

qIndices at all levels must be updated on insertion or deletion from the file.

Update

Deletion

  1. The system finds the record in the data file, then deletes the record.
  2. updates index file: (for single-level index deletion).
Dense indices

The deleted record was:

  • the only record with its particular search-key value: (单记录, search-key唯一性)

    delete corresponding index entry from index file.

  • Not the only record:

    • secondary index: multi pointers to all records with the same search-key value:

    delete the pointer to the deleted record from the index entry.

    • primary index:

      • deleted record was the first record: change the pointer to the next record.
      • Otherwise: do nothing.
Sparse

The search-key values of the deleted record:

  • does not appear in the index: do nothing.
  • Otherwise:
    • if an index entry for the search key exists in the index file, it is deleted by replacing the entry with the next search-key value in the data file (in search-key order).
    • If the next search-key value already has an index entry, the index entry is deleted instead of being replaced.

Insertion

  1. Perform a lookup using the search-key value appearing in the record to be inserted.
  2. Insert the record.
  3. Update the index file.

B±Tree Index

B±Tree File Organization

B-Tree Index Files

Indices on Multiple Keys

Indexing on Flash

Indexing in Main Memory

Write Optimized Indices

Log Structured Merge (LSM) Tree

Buffer Tree

Bitmap Indices


DB C8 Indexing
http://example.com/2023/05/08/DB-08/
Author
Tekhne Chen
Posted on
May 8, 2023
Licensed under