Bitmap indexes
Bitmap indexes are
considered as one of the most promising approach for processing complex ad-hoc
queries in a database environment that have minimum or nil DML operations
performed. In its simplest form a bitmap index on an indexed attribute consists
of a set of bits per attribute value, where size of each bitmap is equal to the
cardinality of the indexed relation. Because of the smaller size and use of
bitmap schemes, bitmap indexes are capable providing highly efficient data
retrieval mechanism exploiting the hardware supported bitmap operations like
AND, OR, XOR and NOT. The bitmaps are created such that the nth
record has a value of X in the indexed attribute if and only if the nth
bit in the bitmap associated with the attribute value X is set to 1, and the nth
bit in each of the other bitmaps are set to 0.
Figure 1 below shows an
example of a bitmap index representation for a set of index attributes, city
for example in its simplest form. The indexing is done by storing a bitmap
representation of rows stored against each city.
Figure 4 |
Bit ‘1’ sets the value for a row. From the
example there are 2 rows with city ‘bangalore’, 1 row each for cities ‘delhi’
and ‘chennai’ and 3 rows for city ‘mumbai’. In commercial implementations of
bitmap indexes, various encoding schemes are used to compact the index size and
bitmaps are not actually stored as plain zeros and ones. Some of the algorithms
make use of the common bitmap pattern of large number of zeros that follows the
ones in the bitmap, storing the zeros in a compressed form.
Oracle’s implementation of bitmap index
Bitmap index, which was introduced
in Oracle 7.3 is designed to work on columns that have relatively lesser number
of distinct values and tables that have no DML (insert, delete, and update)
activity. Bitmap indexes are primarily suited for data warehousing environments
where users always query the database rather that modify it. Bitmap indexes
differ from b-tree indexes in the way it is stored in the disk. Even though the
tree structure is maintained, in case of bitmap indexes, Oracle stores the
index key, range of rowid and a bitmap representation of rows falling between
the rowid ranges. Because of this reason when the cardinality is low, bitmap
indexes are much compact in size as compared to b-tree indexes.
Similar to what is done in
case of b-tree indexes; in bitmap indexes also Oracle stores the index in two
types of nodes called branch nodes and leaf nodes where each node is one data
block in Oracle. Branch nodes in bitmap indexes are more or less same as in
b-tree indexes. Branch nodes store the prefixes of key values along the
pointers to the leaf nodes. Leaf nodes stores the index key, minimum rowid for
the index key, maximum rowid for the index key and a bitmap representing the
rows between minimum rowid and maximum rowid.
A bitmap index created on an empty table starts with a root node, which
actually a leaf node containing the index keys, respective rowid range and
their bitmaps. Oracle uses certain compression algorithm for storing the
bitmap. Figure 6 below shows the state of bitmap index with just one leaf node.
Figure 6 |
Here different from b-tree indexes, not all the
rowids are stored in the leaf nodes. Instead a range of rowid and a bitmap to
represent values are stored again each index key. As more and more data is
added and when there is not enough room in the leaf node to store any more
index keys or bitmaps, Oracle splits the node to form two new leaf nodes and in
the process the current leaf (root) node becomes a branch node.
Figure 7 |
Branch nodes in bitmap indexes are same as
branch nodes in b-tree indexes. Branch nodes store the pointers to the leaf
nodes along with the minimum prefix of the index keys that is good enough for
making the branching decisions.
Illustration from Oracle database
A sample table EMP was created to look at the
way oracle stores the bitmaps indexes in its data files. Following data set was
used to populate the database and a bitmap index (EMP_COMM_BITMAP_IDX) was
created on column comments where the table has three distinct values out of the
total 10 rows.
ID
|
NAME
|
DEPT
|
COMMENTS
|
2
|
AAA
|
A
|
222222222222222…….
|
3
|
AAA
|
B
|
222222222222222…….
|
8
|
AAA
|
A
|
000000000000000…….
|
9
|
AAA
|
C
|
000000000000000…….
|
6
|
AAA
|
A
|
111111111111111…….
|
7
|
AAA
|
B
|
111111111111111…….
|
4
|
AAA
|
A
|
111111111111111…….
|
5
|
AAA
|
B
|
111111111111111…….
|
0
|
AAA
|
A
|
000000000000000…….
|
1
|
AAA
|
B
|
111111111111111…….
|
Note that comments column actually contains
strings of length 3500 characters and in the table above only a partial string
is shown due space constraints. The reason for selecting a column with such a
big string is to have the index spread across at least 2-3 oracle data blocks
which is of 8KB size. Shown below is the tree dump for EMP_COMM_BITMAP_IDX.
----- begin tree dump – BITMAP INDEX
branch: 0x3c00b8a 62917514 (0: nrow: 2,
level: 1)
leaf: 0x3c00b8b 62917515 (-1: nrow: 2 rrow: 2)
leaf: 0x3c00b8c 62917516 (0: nrow: 1 rrow: 1)
----- end tree dump – BITMAP INDEX
The dump shows one branch node and two leaf
nodes and the branch node contain two entries and one leaf node contains one
entry and the other leaf node contain two entries. Here the 3 entries spread in
2 leaf nodes represent the 10 rows in the database table. Following diagram if figure 8 was
constructed by using the data from the branch and leaf data blocks where the
bitmap index data is stored.
Figure 8 |
Here the branch node contains two index key
prefixes and the pointers to leaf blocks. The leaf bock 2955 contains bitmap
information for index key 00000…. (character string of 3500 zeros) and 11111…
(character string of 3500 one’s) and the rowid range for the rows. Leaf block
2956 contains bitmap information for index key 22222… (character of 3500 2’s)
with a range of rowid’s. For a given index keys Oracle finds out the ROWIDs
using the ROWID range and bitmap information together.
A b-tree index for the same
data set takes much more space in the disk as b-tree index stores individual
rowid against each index key. Given below is the tree dump for a b-tree index
created for the comments column with same data set as that of the bitmap index.
Here for b-tree index 8 data blocks are used even though the data is the same. In
case if bitmap index it was only 3 data blocks.
----- begin tree dump – B-TREE INDEX
branch: 0x3c00b8a 62917514 (0: nrow: 2,
level: 2)
branch: 0x3c00b8f 62917519 (-1: nrow: 3, level: 1)
leaf: 0x3c00b8b 62917515 (-1: nrow: 2 rrow: 2)
leaf: 0x3c00b8c 62917516 (0: nrow: 2 rrow: 2)
leaf: 0x3c00b8d 62917517 (1: nrow: 2 rrow: 2)
branch: 0x3c00b91 62917521 (0: nrow: 2, level: 1)
leaf: 0x3c00b8e 62917518 (-1: nrow: 2 rrow: 2)
leaf: 0x3c00b90 62917520 (0: nrow: 2 rrow: 2)
----- end tree
dump – B-TREE INDEX
Comparing the tree dumps of a b-tree index and a
bitmap index, it is clear that the branch nodes works exactly the same way for
both types of indexes and when it comes to leaf nodes it’s totally different.
Read Part 3 here
No comments:
Post a Comment