How many indexes does my table need?
While the question “How many indexes should I have on my tables?” is good and one I get asked about a lot, it is not necessarily an easy one to answer as not all answers are correct for every environment. So care needs to be taken when designing an indexing strategy which will ensure your application performs well in all aspects.
SQL Tuners’ Best Practices guidelines indicate indexes should follow a general rule of thumb:
OLTP systems: 5 to 7 indexes are best
Data warehouse systems: theoretically as many as you need and supported by the RDBMS
Let’s talk about OLTP (OnlineTransaction Processing) first.
OLTP systems by their nature generate large numbers of read and write transactions per second. OLTP systems are required to deliver and write data very quickly as typically there is someone or something waiting for that transaction to complete. This could be a shopper at a grocery stand or a stock trader executing a sell trade. In most of these systems, reads will outnumber writes but in general, OLTP systems perform both reads and writes at much higher rates than data warehouse systems.So why 5 to 7 indexes then? For reads,indexes are great as they allow customization of table entry based upon query design. (i.e. the question being asked). However, keep in mind that every index contains every record within the table. This is an important concept. This means that for DML queries (Updates/Inserts, Deletes) each record that is touched also means that it needs to be written to every nonclustered index which is affected. Let’s examine each of those DML cases separately:
Let’s assume a record needs to have customer information updated -for example a phone number. When looking at the IO needed for an update statement, obviously the table data itself needs to be updated (i.e. the clustered index or heap record) but it also will need to be updated in each index which contains a reference to the phone number column in this case. This could be all the nonclustered indexes, a subset of them or none of them. So count on any updates having to write in multiple places not just one place. It typically isn’t just being written in one place but rather an update of a single record contained within a table consisting of 5 nonclustered indexes and 1 clustered index, an update could have to write this new phone number in up to 6 locations – one for each index and the table itself.
INSERTs and DELETEs:
As mentioned above, every index contains every record within the table. Again – an important concept. Table data is not just kept within the table (i.e. the clustered index or heap record) but rather it is found in copies in every index on the table. If you have 100 million records in your table and 5 nonclustered indexes on that table, your entire record count is duplicated within each of those 5 nonclustered indexes. So, for INSERTs and DELETEs, it is not just creating or removing 1 record. It is actually 1 * as it has to remove or create that record in each of the separate b-trees.
OLTP systems are required to be fast; both in delivering data as well as writing data. With each index you create on a table, you inversely decrease the write performance of queries operating against that table. Reads are unaffected or improved by adding indexes but it is the increase in write IO that ultimately creates the need for a balanced approach to number of indexes found on an OLTP table.
Now let’s examine DW (i.e.BI) systems in terms of index counts.
Data warehouse systems are a much different animal than OLTP systems in that data warehouse systems typically are not required to deliver the same write performance as an OLTP system would require. In addition, data warehouse systems are typically different in numerous ways. Such as:
- Contrary to OLTP systems, DW systems are usually not considered nor required to be “real-time” meaning that data found within a DW system is usually not representative of data values at current date/time. Rather it represents data as it was at some period in the past. i.e. 24 hours ago, yesterday, last week etc.
- While OLTP systems typically contain transaction level detail – i.e. each individual item purchased by a customer, DW systems typically do not need to provide the “drill down” capability to that level. Instead, DW systems provide an aggregate (i.e. roll-up) picture of data. For example, daily, weekly or monthly totals of purchases by customer rather than each item purchased.
- DW systems are typically written to periodically rather than the ad-hoc nature of OLTP systems which can generate writes whenever a customer swipes their loyalty card at the cash register. DW systems receive data typically via ETL (Extract, Transform, Load) processes that run within set intervals. i.e. daily, weekly, monthly etc. This periodic nature of writes eliminates the performance degradation caused by excessive indexes on a table during querying as those queries are read oriented as opposed to the R\W nature of OLTP.
- DW systems tend to be queried from a diverse set of perspectives. Meaning that data is looked at from many different angles as these systems are intended to drive top level decision making in terms of sales, marketing and overall business direction. The need for this drives the need for many different nonclustered indexes allowing table entry from many different perspectives.
Given these significant differences, DW systems can typically handle an exponentially greater number of nonclustered indexes than can OLTP systems. The caveat, of course, is that during load times (i.e. DW refresh cycles) these indexes will typically need to be dropped and recreated after the data is refreshed each cycle. This, of course can also drive the age of the data within the DW system as recreating indexes can be both CPU and disk IO intensive.
Another important aspect is the FILLFACTOR. As with the number of indexes, this is also very different depending on whether you are dealing with an OLTP or DW based system. But I will save that for another blog.