Performance tips for Django applications
Performance is an typical concern when developing applications. In order to have a good back-end performance it is important to be aware of memory footprints that our programs uses, CPU usage, database handling, etc. If no precaution measures are taken, these can quickly become bottlenecks and hurt the general performance of our system. This post shows a collection of performance tips that we can use in our Django application in order to save us from headaches in the future.
Use .count() instead of len() in querysets
1 | musics = Music.objects.all() |
Using the .count() is faster since it uses the COUNT() function at a database level. The len() method forces the queryset to be evaluated and retrieve results that you we will not use if all we want to do is count how many objects are there.
Use a combination of .filter() and .exists() to test existence and membership
1 | musics = Music.objects.filter(title='Django rocks') |
Django provides an .exists() method that we can use instead of counting objects with .count() or testing entries for inclusion with obj in queryset .
Delay queryset evaluations
1 | musics = Music.objects.all() |
Django querysets are lazy, i.e, they are only evaluated (database hits) when strictly necessary, so we should create and combine querysets before performing certain operations such as iteration, len() or slicing which force the results to be fetched from the database. Trips to the database are more time consuming.
Avoid caching mechanisms for one time operations
1 | musics = Music.objects.all() |
The .iterator() method bypasses the internal caching mechanisms and might be useful if we know we are not going to use these objects anymore. Also, this largely reduces the memory footprint, which can be useful if we are loading millions of rows from the database.
Fetch only the required columns
1 | # returns a dict |
or
1 | # returns a list |
These methods avoid creating full model instances and retrieve only the desired field values, avoiding the extra work of fetching the extra columns.
Fetch related objects in a single batch
1 | # fetches related many-to-many and many-to-one objects |
1 | # fetches foreign key relations and one-to-one objects |
These methods retrieve additional objects, to avoid fetching them later. This also caches all the results into memory, which may or not be desirable.
Page results
1 | musics = Music.objects.all() |
Pagination avoids loading all the objects into memory. This will drastically reduce the memory usage since it fetches slices of our dataset, one chunk of rows at a time, from the database.
Use bulk_create() to insert a batch of records
1 | Music.objects.bulk_create(musics) |
Each time we call the .save() method on a model instance, a round trip to the database is performed. Besides, signals are sent for each save operation. This can quickly bring an huge overhead when dealing with thousands or millions of records. A possible workaround is to use the bulk_create method which inserts records in a single query. We only need to give the list of objects we wish to write back to disk in a single database round trip. However, it is important to note that custom save() methods and signals will not be called.
Use distributed and asynchronous processing
External concurrency libraries such as Tornado, Twisted or Asyncio provide non-blocking behavior and asynchronous I/O, great for performing I/O bound tasks such as reading and writing to disk/network. Celery is also great to perform distributed and CPU bound background tasks.
Here is an example of an hypothetical processing of electronic consumption bills, using the backport version of asyncio for Python 2.7.
1 | from django.conf import settings |
1 | $ python manage.py electric_bills.py |
These small tips make a noticeably difference when dealing with huge datasets and are good investments, regarding performance, in the long term.