We all use indexing every day, but I’ve seen many people being unclear about how it works and the use of index. So, let us try to understand indexing once again. Let us read a small story.
Long ago, there was a big library in an ancient city. It had thousands of books, but the books were not arranged in any order in the book shelves. So, each time a person asked for a book to the librarian, the librarian had no way but to check every book to find the required book that the person wanted. Finding the desired book used to take hours for the librarian, and most of the time, the persons who asked for the book had to wait for a long time.
[Hmm… sounds like a table that has no primary key. When data is searched for in a table, the database engine has to scan through the entire table to find the corresponding row, which is very slow.]
Life was getting miserable for the librarian as the number of books and persons asking for books increased day by day. Then one day, a wise guy came to the library, and seeing the librarian’s miserable life, he advised him to number each book and arrange the book shelves according to their numbers. “What benefit would I get?”, asked the librarian. The wise guy answered, “Well, now if somebody gives you a book number and asks for that book, you will be able to find the shelf quickly that contains the book’s number, and within that shelf, you can find that book very quickly as books are arranged according to their number”.
[Numbering the books or book indexing sounds like creating a primary key in a database table. When you create a primary key in a table, a clustered index tree is created, and all data pages containing the table rows are physically sorted in the file system according to their primary key values. Each data page contains rows, which are also sorted within the data page according to their primary key values. So, each time you ask for any row from the table, the database server finds the corresponding data page first using the clustered index tree (like finding the book shelf first) and then finds the desired row within the data page that contains the primary key value (like finding the book within the shelf).]
“This is exactly what I need!” The excited librarian instantly started numbering the books and arranging them across different book shelves. He spent a whole day to do this arrangement and at the end of the day, he tested and found that a book now could be found using the number within no time at all! The librarian was extremely happy.
[That’s exactly what happens when you create a primary key in a table. Internally, a clustered index tree is created, and the data pages are physically sorted within the data file according to the primary key values. As you can easily understand, only one clustered index can be created for a table as the data can be physically arranged only using one column value as the criteria, the primary key. It’s like the books can only be arranged using one criterion such as the book number here.]
Wait! The problem was not completely solved yet. The very next day, a person asked a book by the book’s name (he didn’t have the book’s number, all he had was the book’s name). The poor librarian had no way but to scan all the numbered books from 1 to N to find the one the person asked for. He found the book in the 67th shelf. It took 20 minutes for the librarian to find the book. Earlier, he used to take 2-3 hours to find a book when they were not arranged in the shelves, so that was an improvement. But, comparing to the time to search a book using its number (30 seconds), this 20 minute seemed to be a very high amount of time to the librarian. So, he asked the wise man how to improve on this.
[This happens when you have a Product table where you have a primary key ProductID, but you have no other index in the table. So, when a product is to be searched using the Product Name, the database engine has no way but to scan all physically sorted data pages in the file to find the desired item.]
The wise man told the librarian: “Well, as you have already arranged your books using their serial numbers, you cannot re-arrange them. Better create a catalog or index where you will have all the book’s names and their corresponding serial numbers. In this catalog, arrange the book names in their alphabetic number and group the book names using each alphabet so that if anyone wants to find a book named “Seven Wonders of the World”, you just follow these steps to find the book:
1. Jump into the section “S” of your book name “catalog” and find the book name there.
2. Read the corresponding serial number of the book and find the book using the serial number (you already know how to do this).
“You are a genius!”, exclaimed the librarian. Spending some hours, he immediately created the “Book name” catalog, and with a quick test, he found that he only required a minute (30 seconds to find the book’s serial number in the “Book name” catalog and 30 seconds to find the book using the serial number) to find a book using its name.
The librarian thought that people might ask for books using several other criteria like book name and/or author’s name etc., so he created a similar catalog for author names, and after creating these catalogs, the librarian could find any book using a common book finding criteria (serial number, book name, author’s name) within a minute. The miseries of the librarian ended soon, and lots of people started gathering at the library for books as they could get books really fast, and the library became very popular.
The librarian started living his life happily ever after. End of story.
Similarly, you can create multiple indexes for any data-set which try to catalog the different ways in which a user will search for data and this improves the overall performance of any search operation.