Indexing and Hashing
Database Indexing and Hashing
Database Indexing
An index is a data structure containing index entries. Each index entry contains search key and corresponding database location of record containing search key.
Index is used to speed up database search. We first search in index file and then go to database record directly.
An index is very small as compared to the original database file.
Index increases database modification overhead. We have to update index whenever database is updated.
Index increases database space overhead.
We must use index judiciously because creating too many index may defeat the purpose of database indexing.
Types of indexing scheme
1. Ordered index
In this search keys are stored in order. An index is always sorted on on index key.
2. Hash index
In this search keys are stored according to Hash order using Hash Function.
Ordered index
In this we have to maintain indexed sequential file. It is an ordered sequential file with an index.
Types of index
1. Primary index (clustering index)
In this index search key and order key of the database are the same. It is an example of sparse index. In sparse index there is an index record corresponding to only certain keys in the database. Index contains pointers to block of records.
Example
2. Secondary index (non clustering index)
In this index search key and order key of the database are not the same. It is an example of dense index. In dense index there is an index record corresponding to every record in the database.Index contains pointers to records.
Example
3. Multilevel Indexing
When primary index is too large to fit into memory. It is broken into outer index and inner index. The outer index is primary sparse Index where is the inner index may be sparse or dense.
We can apply this idea to create one more level of multilevel index and so on.
Example
Data Structure used for ordered index
B- tree
It builds upon the idea of binary search tree. It is a data structure used for disk.
It is used to implement index.
It is a balance search tree in which all the leaves are at the same level.
A B- tree of order M has the following properties-
1. Each and every node has between M/2 and M keys.
2. The data items are stored at leaves.
3. All leaves are at the same level.
4. A non leaf node with K keys will have K + 1 children.
5. The root is either a leaf or has between 2 and M + 1 children.
6. All non leaf nodes except the root has between M/2 and M + 1 children.
7. The root has between 1 and M keys.
8. A large value of M reduces the height of B tree.
example B tree with M=4
Bit Map Index
Bitmap or bit vector of search keys
It is a data structure used for indexing.
It is a string of ones and zeros to indicate whether search key is present or absent in corresponding records of the database.
Example
Database Hashing
In this Hash Function is applied to obtain corresponding location where record containing key is to be stored.
h(k) ---> sector on disk
Where h(k) is hash function applied on key k.
Example
Collision
When more than one key hashes to same database location (bucket) it is called collision. If h(k1)=h(k2), it means collision.
Bucket Overflow
When a database bucket is not having enough space to store keys it is called bucket overflow.
Example
h(k)=k mod m, m is number of database buckets.
Bucket overflow resolution
1. Static hashing
2. Dynamic hashing
Static hashing
In static hashing, hashing function is never changed. In this bucket overflow problem can be resolved in two ways -
1. Chaining or open hashing
2. Open addressing closed hashing
Chaining or open hashing
In this another bucket is chained to primary bucket in case of bucket overflow.
Example
Open addressing closed hashing
It probes for another bucket in case of bucket overflow. It is of two types linear probing and quadratic probing.
In linear probing we define hashing function h(k,i) where k is key and i is ith attempt for hashing. In case of bucket overflow in ith attempt i+1th attempt is used. We define h(k,i)=(h(k)+ci)mod m
and h(k)=k mod m, m is number of database buckets and c is constant.
In case of overflow it probes for immediate next bucket.
Example
In quadratic probing we define hashing function h(k,i) where k is key and i is ith attempt for hashing. In case of bucket overflow in ith attempt i+1th attempt is used. We define h(k,i)=(h(k)+ci*i)mod m
and h(k)=k mod m, m is number of database buckets and c is constant.
Problems of static hashing
If data is much more than m buckets then it cannot adjust automatically to the amount or volume of data.
Dynamic hashing
In this Hash Function is adjusted to the volume of data. We have a series of hash function g(k)={h1(k),h2(k)......}
Where h0(k) is called primary bucket hash function.
When primary bucket overflows then h1(k) function is used. When h1(k) bucket overflows h2(k) function is used and so on.
Example
If data increases buckets can expand and if data reduces buckets can collapse.