While I absolutely love the feeling of watching a query that used to take 3 minutes get reduced down to 0.00 second, it really is a headache trying to figure out how indexes work in MySQL. Or any database for that matter.

Say I have a table named employees. This table is huge, containing 50 million rows, and it will keep on growing. I have a query that selects when was last employee hired at a particular team. (This is not the actual situation; I’m using analogies) So the query looks like:

SELECT MAX(join_time) FROM employee WHERE team_id = 1;

So the first thing I did was to add an index (join_time, team_id). When I EXPLAIN the query above, it says that the query optimizer will use the index (join_time, team_id), but the estimated number of rows to scan for the result is still 50 million. And of course, the query still took 3 minutes to execute. So I was thinking why the query would still effectively do a full table scan. It didn’t really make sense to me. Perhaps the DB has to make sure the maximum join_time of the index really has team_id =1. For example, maybe team #1 stopped hiring a long time ago, so the maximum join_time in the index doesn’t belong to team #1. This would result in more scans. But in my case, the maximum join_time does belong to team #1. The DB should be smart enough to look at the first record in the index and stop there.  … I’m sure I completely misunderstand how DB indexes work…

So the next thing I tried was the index (team_id, join_time). And ran EXPLAIN.

And ta-da! The EXPLAIN says no indices are used, with “Select tables optimized away” in the Extra column. That means the optimizer decided that it only needs to look at one record! So the query took 0.00 seconds to execute; which is infinitely faster than before. Hm, can I put “optimized a critical query to run infinitely faster” on my resume…? ;)

Nevertheless, that was a stroke of luck. I am still not sure why the first index didn’t work.

Advertisements