MySQL Indexing Basics

You know what happens to a dynamic, database-backed website over time – it gets slower and slower.  At Dreamhost, we’re constantly upgrading server hardware to keep your site happy and responsive, but there might still be other problems.  If your site is backed by a database, it’s likely that data grows over time – and so do query times. Sometimes, old data simply has to be deleted.  Often, though, the problem can be fixed by adding the correct indexes to larger and larger tables.

 How can you start with basic MySQL indexing?

This is a big topic; books have been written about it!  So, I can only offer some basic tips here. First, unless you’re using specific MyISAM features – like full-text searching – you should use InnoDB as your storage engine.  This is actually only important for index topics not covered in this article, but InnoDB in general improves concurrency, provides transactions, better recovery, and many other benefits.

      1. Why indexes in the first place?

The bottleneck in most database systems (this may change with large memory and SSDs) is disk I/O. This usually means decreasing disk I/O is necessary to make queries perform better. Database indexes are similar to the index at the end of a book. In other words, you don’t have to read through the whole book (which in database terms is called a “full table scan”) to search for something. You can simply look up a word and find it. By doing this, I/O is reduced.

       2. How do you add an index?

To add a correct index, identify the query that executes slowly, and then look at the “where” clause of that query – which fields are used for filtering?  These are probably the ones that should be indexed. For example:

SELECT first_name, last_name FROM contacts WHERE city = “Los Angeles”;

Here, if there isn’t already an index on the column “city” (and if the table is more than a few rows), the index should be added:

ALTER TABLE contacts ADD KEY (city);

If you are filtering by multiple fields, like:

SELECT first_name, last_name FROM contacts WHERE status = “active” AND delivery_method = “mail” AND city = “Los Angeles”;

… then you should create a “composite” index:

ALTER TABLE contacts ADD KEY (status, delivery_method, city);

This index will speed up queries that filter by either all of these fields or a subset of them, starting from the left column, as specified in the ALTER TABLE statement.

That means the following where clauses will use the index:

… WHERE status = “active” AND delivery_method = “mail” AND city = “Los Angeles”;

… WHERE status = “active” AND delivery_method = “mail”;

… WHERE status = “active”;

In contrast, where clauses like these will not use this composite index:

… WHERE city = “Los Angeles”;

… WHERE delivery_method = “mail” AND city = “Los Angeles”;

You would have to create another index with the necessary column(s).

       3. What tools does MySQL offer to help with indexing?

MySQL comes with a few commands that help when analyzing potential indexing problems. One of the most important is “EXPLAIN”. Usage is simple: just put EXPLAIN in front of a specific SELECT query, and MySQL will show you which indexes (if any) it would use. Example:

EXPLAIN SELECT first_name, last_name FROM contacts WHERE city = “Los Angeles”;

(I created a small test table consisting of five rows)

mysql> EXPLAIN SELECT first_name, last_name FROM contacts WHERE city = "Los Angeles"\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: contacts
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
Extra: Using where
1 row in set (0.00 sec)

As you see here, possible_keys and keys are set to NULL, which means the MySQL optimizer didn’t find any keys to use. An important column in the EXPLAIN output is “rows”, which is an estimate of how many rows MySQL has to read to find the desired result. Here, it is five, which are all rows in the table. That means the database has to perform a full table scan.

Let’s add an index:

ALTER TABLE contacts ADD KEY (city);

That brings us to another important command – you can use “SHOW INDEX FROM <table>” to find out which indexes are already created on a particular table:

mysql> SHOW INDEX FROM contacts\G
*************************** 1. row ***************************
Table: contacts
Non_unique: 0
Key_name: PRIMARY
Seq_in_index: 1
Column_name: id
Collation: A
Cardinality: 5
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
*************************** 2. row ***************************
Table: contacts
Non_unique: 1
Key_name: city
Seq_in_index: 1
Column_name: city
Collation: A
Cardinality: 5
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Index_comment:
2 rows in set (0.00 sec)

Here we see two rows – the second row shows the newly added index on the column city. Now if we run the same EXPLAIN-statement again, we get a different – better – result:

mysql> EXPLAIN SELECT first_name, last_name FROM contacts WHERE city = "Los Angeles"\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: contacts
type: ref
possible_keys: city
key: city
key_len: 103
ref: const
rows: 1
Extra: Using where
1 row in set (0.00 sec)

Notice that the possible_keys and keys columns both contain the field “city”, which means that MySQL was able to use this index to satisfy the query. Another important difference is the number of rows MySQL estimates it will have to search – it’s now only one!  The database is now going directly from the index to the row (which is referenced by the primary key).

Now, in this example, the difference between reading one or five rows might not seem that significant, but imagine how this helps when you have a table with millions of rows – the index reduces the necessary rows to read to just a few hundred!  You might be wondering: if indexes are that great, why couldn’t we just add an index on every column and every combination of columns of the table, and be done with it?  The problem is, the database management system also has to maintain the indexes, which usually happens after INSERT, DELETE, and UPDATE statements.  In addition to that, indexes also need disk space. So, it’s all a trade-off – the more indexes you add, the faster your SELECT queries might be, but they might slow down data modifications.

Keep in mind, this is a basic introduction to database indexes, but I hope it gives you a starting point to optimize your queries.  If you have any questions, or feedback, please share below in the comments!