I was optimizing a Django application I’m working on the other day using Simon Willison’s excellent DebugFooter middleware, which adds a footer to each page showing which SQL queries were executed by Django when generating the page.
I’m a bit of a caching addict so I had already added a caching layer on top of my models, and thus I was quite surprised to find that the most important page on the site still generated 5-15 SQL queries on every access, even though the objects it was accessing supposedly were cached.
The objects were indeed cached, but every time I was accessing one of the ForeignKey fields on the model objects Django generated a SQL query to find the data for the related object. This could quickly turn nasty if you follow such relationships in a loop on a high-traffic web site.
The solution was the select_related() QuerySet method. Borrowing from the Django documentation, a normal ORM lookup would look like this:
e = Entry.objects.get(id=5) b = e.blog
This would generate two SQL queries, one to fetch the entry object and one to fetch the blog object once it’s referred to from the entry object. The same example with select_related() becomes:
e = Entry.objects.select_related().get(id=5) b = e.blog
This example only generates one SQL query, albeit bigger and slower than each of the individual queries in the first example because of the necessary join between the model tables to find all the data in one go. However, this doesn’t matter if the fetched object will go directly into a cache anyway and stay there for a possibly rather long time, which was the case for me.