Saturday, December 14, 2013

Indexes in Oracle Database - Part 2

This is Part 2 in 3 part series. Read Part 1Part 3 here.

Bitmap indexes

Bitmap indexes are considered as one of the most promising approach for processing complex ad-hoc queries in a database environment that have minimum or nil DML operations performed. In its simplest form a bitmap index on an indexed attribute consists of a set of bits per attribute value, where size of each bitmap is equal to the cardinality of the indexed relation. Because of the smaller size and use of bitmap schemes, bitmap indexes are capable providing highly efficient data retrieval mechanism exploiting the hardware supported bitmap operations like AND, OR, XOR and NOT. The bitmaps are created such that the nth record has a value of X in the indexed attribute if and only if the nth bit in the bitmap associated with the attribute value X is set to 1, and the nth bit in each of the other bitmaps are set to 0.


Figure 1 below shows an example of a bitmap index representation for a set of index attributes, city for example in its simplest form. The indexing is done by storing a bitmap representation of rows stored against each city.
Figure 4
Bit ‘1’ sets the value for a row. From the example there are 2 rows with city ‘bangalore’, 1 row each for cities ‘delhi’ and ‘chennai’ and 3 rows for city ‘mumbai’. In commercial implementations of bitmap indexes, various encoding schemes are used to compact the index size and bitmaps are not actually stored as plain zeros and ones. Some of the algorithms make use of the common bitmap pattern of large number of zeros that follows the ones in the bitmap, storing the zeros in a compressed form.

Oracle’s implementation of bitmap index


Bitmap index, which was introduced in Oracle 7.3 is designed to work on columns that have relatively lesser number of distinct values and tables that have no DML (insert, delete, and update) activity. Bitmap indexes are primarily suited for data warehousing environments where users always query the database rather that modify it. Bitmap indexes differ from b-tree indexes in the way it is stored in the disk. Even though the tree structure is maintained, in case of bitmap indexes, Oracle stores the index key, range of rowid and a bitmap representation of rows falling between the rowid ranges. Because of this reason when the cardinality is low, bitmap indexes are much compact in size as compared to b-tree indexes.

Similar to what is done in case of b-tree indexes; in bitmap indexes also Oracle stores the index in two types of nodes called branch nodes and leaf nodes where each node is one data block in Oracle. Branch nodes in bitmap indexes are more or less same as in b-tree indexes. Branch nodes store the prefixes of key values along the pointers to the leaf nodes. Leaf nodes stores the index key, minimum rowid for the index key, maximum rowid for the index key and a bitmap representing the rows between minimum rowid and maximum rowid.

A bitmap index created on an empty table starts with a root node, which actually a leaf node containing the index keys, respective rowid range and their bitmaps. Oracle uses certain compression algorithm for storing the bitmap. Figure 6 below shows the state of bitmap index with just one leaf node.
Figure 6
Here different from b-tree indexes, not all the rowids are stored in the leaf nodes. Instead a range of rowid and a bitmap to represent values are stored again each index key. As more and more data is added and when there is not enough room in the leaf node to store any more index keys or bitmaps, Oracle splits the node to form two new leaf nodes and in the process the current leaf (root) node becomes a branch node.
Figure 7
Branch nodes in bitmap indexes are same as branch nodes in b-tree indexes. Branch nodes store the pointers to the leaf nodes along with the minimum prefix of the index keys that is good enough for making the branching decisions.

Illustration from Oracle database


A sample table EMP was created to look at the way oracle stores the bitmaps indexes in its data files. Following data set was used to populate the database and a bitmap index (EMP_COMM_BITMAP_IDX) was created on column comments where the table has three distinct values out of the total 10 rows.

ID
NAME
DEPT
COMMENTS
2
AAA
A
222222222222222…….
3
AAA
B
222222222222222…….
8
AAA
A
000000000000000…….
9
AAA
C
000000000000000…….
6
AAA
A
111111111111111…….
7
AAA
B
111111111111111…….
4
AAA
A
111111111111111…….
5
AAA
B
111111111111111…….
0
AAA
A
000000000000000…….
1
AAA
B
111111111111111…….



Note that comments column actually contains strings of length 3500 characters and in the table above only a partial string is shown due space constraints. The reason for selecting a column with such a big string is to have the index spread across at least 2-3 oracle data blocks which is of 8KB size. Shown below is the tree dump for EMP_COMM_BITMAP_IDX.



----- begin tree dump – BITMAP INDEX
branch: 0x3c00b8a 62917514 (0: nrow: 2, level: 1)
   leaf: 0x3c00b8b 62917515 (-1: nrow: 2 rrow: 2)
   leaf: 0x3c00b8c 62917516 (0: nrow: 1 rrow: 1)
----- end tree dump – BITMAP INDEX


The dump shows one branch node and two leaf nodes and the branch node contain two entries and one leaf node contains one entry and the other leaf node contain two entries. Here the 3 entries spread in 2 leaf nodes represent the 10 rows in the database table. Following diagram if figure 8 was constructed by using the data from the branch and leaf data blocks where the bitmap index data is stored. 

Figure 8
Here the branch node contains two index key prefixes and the pointers to leaf blocks. The leaf bock 2955 contains bitmap information for index key 00000…. (character string of 3500 zeros) and 11111… (character string of 3500 one’s) and the rowid range for the rows. Leaf block 2956 contains bitmap information for index key 22222… (character of 3500 2’s) with a range of rowid’s. For a given index keys Oracle finds out the ROWIDs using the ROWID range and bitmap information together.

A b-tree index for the same data set takes much more space in the disk as b-tree index stores individual rowid against each index key. Given below is the tree dump for a b-tree index created for the comments column with same data set as that of the bitmap index. Here for b-tree index 8 data blocks are used even though the data is the same. In case if bitmap index it was only 3 data blocks.
  
----- begin tree dump – B-TREE INDEX
branch: 0x3c00b8a 62917514 (0: nrow: 2, level: 2)
   branch: 0x3c00b8f 62917519 (-1: nrow: 3, level: 1)
      leaf: 0x3c00b8b 62917515 (-1: nrow: 2 rrow: 2)
      leaf: 0x3c00b8c 62917516 (0: nrow: 2 rrow: 2)
      leaf: 0x3c00b8d 62917517 (1: nrow: 2 rrow: 2)
   branch: 0x3c00b91 62917521 (0: nrow: 2, level: 1)
      leaf: 0x3c00b8e 62917518 (-1: nrow: 2 rrow: 2)
      leaf: 0x3c00b90 62917520 (0: nrow: 2 rrow: 2)
----- end tree dump – B-TREE INDEX

Comparing the tree dumps of a b-tree index and a bitmap index, it is clear that the branch nodes works exactly the same way for both types of indexes and when it comes to leaf nodes it’s totally different. 

Read Part 3 here

No comments:

Post a Comment