Jay Pipes, a MySQL employee, gives a lecture at Google on MySQL Optimization. I’ve been in several discussions on how to optimize large queries, especially during my time working on TagJungle*.
This video has opened my eyes to some query practices that I used to do that are sub-optimal, and how to correct those problems.
One of my favorite optimizations that Jay Pipes demonstrates is that you can actually run a query like:
SELECT * FROM users WHERE email LIKE '%@gmail.com'; and have MySQL use an index to help perform that query. It’s really quite clever, and I used to think that you were just out of luck with a slow query if you ever wanted this type of information. I would post the solution to this problem here, but the video is beneficial enough to just recommend watching the entire video.
Another big eye opener for me was the concept of Correlated Subqueries (which aren’t a good thing to do). Here’s an example of a correlated subquery:
(SELECT COUNT(*) FROM Reservation WHERE confirmed<>'N' AND campId=C.campID) AS Confirmed,
(SELECT COUNT(*) FROM Reservation WHERE confirmed='N' AND campId=C.campID) AS Unconfirmed
FROM Camp C
This is bad because each of the subqueries must be executed for each camp record. This makes it so the query doesn’t scale well as the number of camps grow.
To fix this problem, it is suggested to think of these types of queries in sets rather than as procedural operations. The above query could be executed like so:
SELECT C.name, Confirmed, Unconfirmed
FROM Camp C LEFT JOIN
(SELECT campId, COUNT(*) as Confirmed
FROM Reservation WHERE confirmed<>'N'
GROUP BY campId) AS ConfTable
(SELECT campId, COUNT(*) as Unconfirmed
GROUP BY campId) AS UnconfTable
This is provides much faster execution as only 3 lookups need to be made to the db and then just joined together. This will scale up as more records are added to the database.
Here’s the video:
* Disclaimer: I no longer am an employee of Tag Jungle or 42Co.