Saturday, December 14, 2013

Indexes in Oracle Database - Part 2

This is Part 2 in 3 part series. Read Part 1Part 3 here.

Bitmap indexes

Bitmap indexes are considered as one of the most promising approach for processing complex ad-hoc queries in a database environment that have minimum or nil DML operations performed. In its simplest form a bitmap index on an indexed attribute consists of a set of bits per attribute value, where size of each bitmap is equal to the cardinality of the indexed relation. Because of the smaller size and use of bitmap schemes, bitmap indexes are capable providing highly efficient data retrieval mechanism exploiting the hardware supported bitmap operations like AND, OR, XOR and NOT. The bitmaps are created such that the nth record has a value of X in the indexed attribute if and only if the nth bit in the bitmap associated with the attribute value X is set to 1, and the nth bit in each of the other bitmaps are set to 0.


Figure 1 below shows an example of a bitmap index representation for a set of index attributes, city for example in its simplest form. The indexing is done by storing a bitmap representation of rows stored against each city.
Figure 4
Bit ‘1’ sets the value for a row. From the example there are 2 rows with city ‘bangalore’, 1 row each for cities ‘delhi’ and ‘chennai’ and 3 rows for city ‘mumbai’. In commercial implementations of bitmap indexes, various encoding schemes are used to compact the index size and bitmaps are not actually stored as plain zeros and ones. Some of the algorithms make use of the common bitmap pattern of large number of zeros that follows the ones in the bitmap, storing the zeros in a compressed form.

Oracle’s implementation of bitmap index


Bitmap index, which was introduced in Oracle 7.3 is designed to work on columns that have relatively lesser number of distinct values and tables that have no DML (insert, delete, and update) activity. Bitmap indexes are primarily suited for data warehousing environments where users always query the database rather that modify it. Bitmap indexes differ from b-tree indexes in the way it is stored in the disk. Even though the tree structure is maintained, in case of bitmap indexes, Oracle stores the index key, range of rowid and a bitmap representation of rows falling between the rowid ranges. Because of this reason when the cardinality is low, bitmap indexes are much compact in size as compared to b-tree indexes.

Similar to what is done in case of b-tree indexes; in bitmap indexes also Oracle stores the index in two types of nodes called branch nodes and leaf nodes where each node is one data block in Oracle. Branch nodes in bitmap indexes are more or less same as in b-tree indexes. Branch nodes store the prefixes of key values along the pointers to the leaf nodes. Leaf nodes stores the index key, minimum rowid for the index key, maximum rowid for the index key and a bitmap representing the rows between minimum rowid and maximum rowid.

A bitmap index created on an empty table starts with a root node, which actually a leaf node containing the index keys, respective rowid range and their bitmaps. Oracle uses certain compression algorithm for storing the bitmap. Figure 6 below shows the state of bitmap index with just one leaf node.
Figure 6
Here different from b-tree indexes, not all the rowids are stored in the leaf nodes. Instead a range of rowid and a bitmap to represent values are stored again each index key. As more and more data is added and when there is not enough room in the leaf node to store any more index keys or bitmaps, Oracle splits the node to form two new leaf nodes and in the process the current leaf (root) node becomes a branch node.
Figure 7
Branch nodes in bitmap indexes are same as branch nodes in b-tree indexes. Branch nodes store the pointers to the leaf nodes along with the minimum prefix of the index keys that is good enough for making the branching decisions.

Illustration from Oracle database


A sample table EMP was created to look at the way oracle stores the bitmaps indexes in its data files. Following data set was used to populate the database and a bitmap index (EMP_COMM_BITMAP_IDX) was created on column comments where the table has three distinct values out of the total 10 rows.

ID
NAME
DEPT
COMMENTS
2
AAA
A
222222222222222…….
3
AAA
B
222222222222222…….
8
AAA
A
000000000000000…….
9
AAA
C
000000000000000…….
6
AAA
A
111111111111111…….
7
AAA
B
111111111111111…….
4
AAA
A
111111111111111…….
5
AAA
B
111111111111111…….
0
AAA
A
000000000000000…….
1
AAA
B
111111111111111…….



Note that comments column actually contains strings of length 3500 characters and in the table above only a partial string is shown due space constraints. The reason for selecting a column with such a big string is to have the index spread across at least 2-3 oracle data blocks which is of 8KB size. Shown below is the tree dump for EMP_COMM_BITMAP_IDX.



----- begin tree dump – BITMAP INDEX
branch: 0x3c00b8a 62917514 (0: nrow: 2, level: 1)
   leaf: 0x3c00b8b 62917515 (-1: nrow: 2 rrow: 2)
   leaf: 0x3c00b8c 62917516 (0: nrow: 1 rrow: 1)
----- end tree dump – BITMAP INDEX


The dump shows one branch node and two leaf nodes and the branch node contain two entries and one leaf node contains one entry and the other leaf node contain two entries. Here the 3 entries spread in 2 leaf nodes represent the 10 rows in the database table. Following diagram if figure 8 was constructed by using the data from the branch and leaf data blocks where the bitmap index data is stored. 

Figure 8
Here the branch node contains two index key prefixes and the pointers to leaf blocks. The leaf bock 2955 contains bitmap information for index key 00000…. (character string of 3500 zeros) and 11111… (character string of 3500 one’s) and the rowid range for the rows. Leaf block 2956 contains bitmap information for index key 22222… (character of 3500 2’s) with a range of rowid’s. For a given index keys Oracle finds out the ROWIDs using the ROWID range and bitmap information together.

A b-tree index for the same data set takes much more space in the disk as b-tree index stores individual rowid against each index key. Given below is the tree dump for a b-tree index created for the comments column with same data set as that of the bitmap index. Here for b-tree index 8 data blocks are used even though the data is the same. In case if bitmap index it was only 3 data blocks.
  
----- begin tree dump – B-TREE INDEX
branch: 0x3c00b8a 62917514 (0: nrow: 2, level: 2)
   branch: 0x3c00b8f 62917519 (-1: nrow: 3, level: 1)
      leaf: 0x3c00b8b 62917515 (-1: nrow: 2 rrow: 2)
      leaf: 0x3c00b8c 62917516 (0: nrow: 2 rrow: 2)
      leaf: 0x3c00b8d 62917517 (1: nrow: 2 rrow: 2)
   branch: 0x3c00b91 62917521 (0: nrow: 2, level: 1)
      leaf: 0x3c00b8e 62917518 (-1: nrow: 2 rrow: 2)
      leaf: 0x3c00b90 62917520 (0: nrow: 2 rrow: 2)
----- end tree dump – B-TREE INDEX

Comparing the tree dumps of a b-tree index and a bitmap index, it is clear that the branch nodes works exactly the same way for both types of indexes and when it comes to leaf nodes it’s totally different. 

Read Part 3 here

Sunday, December 8, 2013

Indexes in Oracle Database - Part 1

This is Part 1 in 3 part series. Read Part 2Part 3 here.

In an Oracle Database, indexes are optional structures associated with database tables and clusters, used for speeding up the data retrieval. Just like an index in a book helps finding the information inside the book faster, an index in the oracle database provides a faster access path to the table data. Indexes are independent of the data in the associated table and indexes can be created or dropped without affecting the base table and its data. On the other hand, once created oracle automatically maintains the indexes and any changes in table data are reflected in the index automatically with no additional action from the users manipulating the table data. In order to support large data volumes, indexes are stored in disks using data structures that provide efficient retrieval of table data with minimum disk I/O. 

There are different types of indexes oracle supports and internal storage wise there are mainly 2 types, the b-tree indexes and bitmap indexes. A b-tree index uses a tree data structure called balanced tree (b-tree) to store the data to support quick retrieval of data.  In case of bitmap indexes, oracle stores a bitmap of rowid (unique identifier used to identify a row in Oracle) with distinct values the indexed column has. Here in this 2 blog series I am trying to illustrate the working of B-Tree and Bitmap indexes in Oracle database.

B-tree indexes

B-tree indexes are the standard/traditional tree indexes oracle has been using since its earlier versions, to provide improved speed for data retrieval operations from database tables. Oracle stores the indexes as balanced trees (B-Tree) in the disk in order to minimize the disk I/O while retrieving data. For the columns for which index is created, the data is sorted and stored as rowid – index key value pair for each row.  In oracle, the b-Tree indexes are stored as an ordered list of values divided into block-wide ranges. The end points of the ranges along with the pointers to the blocks are stored as a search tree where a value from a set of N entries can be found in log (n) time.

What is a B-tree ?

Tree structures support various kind of data manipulation operations like search, insert, delete on a set of data in a time proportional to the height of the tree. B-Tree or balanced tree is a variant of the tree data structure which is optimized for situations where it is not possible to keep the entire tree structure in the main memory and require a secondary storage such as disks to maintain the tree. In operations where large volume of data is consumed, systems spend most of the time in waiting for the data to be retrieved from disk which is the most expensive and time consuming of all the operations. A B-Tree stores the data in such way that, the data stored in the tree structure can be retrieved with least number of disk accesses, thus speeding up the data retrieval process.

In a b-tree, each node is designed to accommodate more than one keys and the number of sub trees originating from each node depends on the number keys stored in the node. This design characteristic of b-tree to branch out in number of directions and to store large number of keys in each node helps in keeping height of the tree relatively small which in turn reduces the I/O required while doing operations like searching an element or adding or deleting an element. Every B-tree is said to be of some order “n”, meaning each node in the b-tree can contain “n” to “2n” number of keys.


For example, following diagram (Figure 1) shows a b-tree of order 1 storing numbers from 0 - 9. Being a b-tree of order 1, each node in the tree can have 1 to 2 keys and a maximum of 3 pointers pointing to the sub trees.  On each node keys are kept in sorted order.


Figure 1


In this case any key in the tree can be searched with a maximum of 3 node look up. Here it may not look significant since the example shows a b-tree of order 1 storing 10 keys. In practical scenarios like an oracle index storing hundreds of thousands of keys, b-tree provides large savings in terms of disk I/O, hence speeding up the data retrieval significantly. For example, in a b-tree storing 10,000,000 keys with 50 keys in each node, in order to search a key, a maximum of 4 node look up is good enough.

Oracle’s implementation of b-tree index

The basic storage unit for an oracle database is a ‘data block’; meaning when oracle database processes read or write data from disk it is done for a data block. One data block corresponds to a specific number of bytes of physical database space on disk depending on the data block size mentioned while creating the database. The next level of logical database space is an extent. An extent is a specific number of contiguous data blocks allocated for storing a specific type of information. The level of logical database storage above an extent is called a segment and oracle allocates different type of segments to store different type of objects. When an index is created, index is stored by allocating an index segment in the tablespace specified in the index creation statement.

For B-tree indexes in oracle, each block allocated for the index is considered as a node in the b-tree data structure. B-tree indexes have two types of nodes, leaf nodes and branch nodes. Leaf nodes store the index keys, which is the basic unit of an index, in sorted order together with the rowid information which can be used to locate the row in the database table.  Branch nodes store a reduced list of index keys, and pointers to the child blocks containing the keys.  In branch nodes, oracle does not always store the complete index key, but stores only the minimum key prefix that is required to make a branching decision between two keys.

When an index is created on an empty table, a node is created which is called the root node (which is actually and empty data block) and as data gets inserted in the table, oracle stores index keys in the node in sorted order, along with the rowid information. At this point of time root node acts as a leaf node. A root node at this stage can be represented as in figure 2.
Figure 2

As more keys get added to the index, the root node becomes full and oracle splits the root node to create two leaf nodes to accommodate the additional entries.  While doing so, the root node becomes a branch node storing block address of 2 of the newly created leaf nodes along with first index key stored in those blocks.  All the index keys and their rowid information are distributed to the newly created leaf nodes. Figure 3 represents the state of the b-tree index at this point of time. 
Figure 3




As more and more data gets added to the table, more and more leaf nodes are created and more entries are made in the root node to point to the newly created leaf nodes and at one point, the root node may not have enough space to hold anymore entries to store the block addresses of the leaf nodes that are getting created. The root node is again split to create another level of branch nodes in order to accommodate the growing number of leaf nodes. This process is continued so that all the leaf blocks are maintained at the same depth, and retrieving any record from any part of the index will take roughly the same amount of time. Apart from the index key, rowid pair, leaf nodes also stores block addresses of the adjacent leaf nodes so that the leaf nodes are cyclically linked to each other.

Illustration from Oracle Database

To have a better understanding at Oracle’s way of storing indexes, a sample table called EMP was created and an index EMP_COMM_IDX was created on column COMMENTS which of size 3500 characters. A column of such a big size was chosen to produce an index layout that is spread across multiple leaf/branch nodes with minimum number of records. In this case 8KB being the data block size, Oracle will be able to store only 2 index keys in one data block and with 10 rows itself index will have 5 blocks associated to it. Following table shows the content of EMP table, the data set used in this example. Note that the comments column in the EMP table actually contains character data of size 3500 characters, and only a portion of the data is actually shown in the table given below because of space constraints.

ID
NAME
DEPT
COMMENTS
2
AAA
A
22222222222222222222…..
3
AAA
B
33333333333333333333…..
8
AAA
A
44444444444444444444…..
9
AAA
C
99999999999999999999…..
6
AAA
A
66666666666666666666…..
7
AAA
B
77777777777777777777….
4
AAA
A
88888888888888888888….
5
AAA
B
55555555555555555555….
0
AAA
A
00000000000000000000….
1
AAA
B
11111111111111111111…..

The index was created after populating 10 records in the table with comments column having strings of 3500 characters length.  Shown below is the block dump for the index EMP_COMM_IDX from the database generated using oracle dump utility.

     ----- begin tree dump
branch: 0x3c00b8a 62917514 (0: nrow: 5, level: 1)
   leaf: 0x3c00b8b 62917515 (-1: nrow: 2 rrow: 2)
   leaf: 0x3c00b8c 62917516 (0: nrow: 2 rrow: 2)
   leaf: 0x3c00b8d 62917517 (1: nrow: 2 rrow: 2)
   leaf: 0x3c00b8e 62917518 (2: nrow: 2 rrow: 2)
   leaf: 0x3c00b8f 62917519 (3: nrow: 2 rrow: 2)
----- end tree dump

The dump shows a total of six blocks allocated for the index EMP_COMM_IDX, out of which five are leaf nodes and the other one a branch node. Here each leaf node has 2 index keys stored in it and the branch node contains prefixes for five index keys and their respective child leaf block’s addresses. Note that in branch nodes, oracle does not store the entire index key, but only the prefixes of the index keys that are good enough for making the branching decision. Looking further into each block using oracle’s data block dump utility following diagram was developed with the real block addresses that Oracle has used to represent the index in the data file.

Figure 4
Here the branch node block address is 2954 and leaf nodes starts from block 2955 to 2959. In the branch node Oracle has stored only one character (2 for a string of 3500 2’s for example) for each index key which enough for making the branching decision while searching for a key. Apart from rowid/keys pairs leaf nodes also stores block address for adjacent leaf nodes.

Read Part 2 here