Optimize Django build to run faster on GitHub Actions

Optimize build time for a Django project running on GitHub Actions. A couple steps for faster tests and smoother workflow.

Optimize Django build to run faster on GitHub Actions

We believe that all builds should be really fast because:

  • Human time is way more valuable then compute time. That is why developers should not waste time waiting for builds to pass.
  • Releasing software frequently is important and slow builds should not delay releases.
  • Going for a coffee until the build is done takes you out of the flow.

If you want to get your Django project up and running on GitHub Actions. Check our previous article about GitHub Actions with Django and Postgres.

Initial setup

In this article, we will optimize a Django build that managed to sneak over the 5 min mark. In this particular case, we are using py.test to run our tests and Github Actions as CI. Nevertheless, all the points in this article are applicable to other test frameworks and continuous integration services as well.

We are starting with a standard Django API backend with ~1100 tests most of which hit a PostgreSQL database. The tests solely need 3min 30sec to pass locally.

$ time py.test 
py.test  148.41s user 4.03s system 72% cpu 3:31.69 total

And when you combine that with all the other build steps it adds up to 6min 22sec.

Create а settings file for you tests

It is a great idea to add a separate settings file for your test settings. That way you can set some Django settings only while running your tests. If you’re using Django Cookiecutter you are probably already familiar with this concept.

$ touch config/settings/test.py
from .base import *  # Load everything from the base settings

Then just tell py.test to load this settings file on every run by pasting this to your pytest.ini

[pytest]
DJANGO_SETTINGS_MODULE=config.settings.test

Change Django password hashers

To test different scenarios we create users in our tests. Creating a user in Django by default is slow because of the secure algorithm that is used to hash the password.

In your test.py you can set your password hasher to something faster like MD5.

PASSWORD_HASHERS = ['django.contrib.auth.hashers.MD5PasswordHasher']

Before:

$ time py.test 
py.test  148.41s user 4.03s system 72% cpu 3:31.69 total

After:

$ time py.test 
py.test  142.21s user 2.63s system 71% cpu 3:25.96 total

It is not a huge improvement but it is almost free.

Turn off Django debug mode

Django is slower in debug mode. So tests are slower if your Django is in debug mode too.

Before:

time py.test 
py.test  142.21s user 2.63s system 71% cpu 3:25.96 total

After:

time py.test 
py.test  137.21s user 2.63s system 71% cpu 3:15.96 total

Run your tests with --nomigrations

When you run your tests, the test runner gets all your migrations and tries to apply them one by one in order to create a test database. In projects with a lot of migrations that may be extremely time-consuming.

--nomigrations Gets your models and constructs and creates the database directly from them. In our case that saves a minute on every build.

time py.test 
py.test  137.21s user 2.63s system 71% cpu 3:15.96 total
time py.test --nomigrations
py.test --nomigrations  85.86s user 1.96s system 68% cpu 2:08.22 total

Keep in mind that by skipping the migrations, you’re not checking for migration conflicts. They appear often if you are in a larger team. So you should run your migrations in a separate parallel build step.

Cache dependencies on CI

Installing dependencies on a CI always takes time, even though most of the CI services have cache layers for the most common package managers. GitHub Actions recommends caching the ~/.cache/pip directory, but even with cache, pip makes HTTP requests to check for newer versions of every package. This may take more than a minute (depending on the number of dependencies you have)

A nice hack is to cache the whole python environment. That will reduce the time needed for package installation to a couple of seconds.

In this example with GitHub Actions, we use the content of the local.txt and the base.txt

 steps:
  - uses: actions/checkout@v1
  - name: Set up Python 3.6.10
    uses: actions/setup-python@v1
    with:
      python-version: 3.6.10
  - name: Cache pip
    uses: actions/cache@v1
    with:
      path: /opt/hostedtoolcache/Python/3.6.10/x64/ # This path is specific to Ubuntu
      # If the requirements files change, the cache will not hit. 
      key: python-${{ hashFiles('requirements/local.txt') }}-${{ hashFiles('requirements/base.txt') }}

Parallelism with pytest-xdist

py.test normally creates a test database and then executes the tests one by one. In many cases, this does not really utilize the maximum potential of the hardware.


Using pytest-xdist you can run your tests in parallel. For example, if you have 4 available cores you can run py.test --nomigrations -n 4. This will create 4 test databases, separate your tests in 4 groups, and run them in parallel.


Using -n will set the number of processes to equal the number of cores on the machine.

$ time py.test --nomigrations -n auto  
py.test --nomigrations -n auto  177.71s user 5.62s system 175% cpu 1:44.76 total

You can achieve the same using python manage.py test –parallel

Final result

After all that effort we managed to reduce the build time from 6:22 to 3:00 which is more than 50% ⬇️

If that is not enough then get more powerful hardware

In many cases, you just have a lot to test. Some tests are slower by design and you can’t just skip them. If your build utilizes the full potential of your hardware and the tests are still slower than you’d like – it is just time for better hardware.

Out of the box, GitHub Actions gives you 2 cores and 7GB of RAM. This is more than enough to start, but when the project gets big and you have a lot of tests in your suite you may consider hosting your own GitHub Actions runner.

Running a system with more cores will significantly boost the build time. Most of the CI services offer either some kind of self-hosted solution or running on the hardware of your choice.

At the time of writing, GitHub Actions doesn’t have support for choosing different hardware to run your tests on. In their forum, they promised to include it at some point, but if you need more power right away, you’ll have to host some workers on your own.

And finally a couple of bad practices.

Don’t use SQLite to run your tests faster.

There are many articles on the web that suggest running with an in-memory SQLite server to save time during builds. We don’t recommend doing this. In order to tests that your code would work properly on production, you need to test with exactly the same database while you are running tests. Even the version should be the same.

Many times your codebase depends on specific PostgreSQL features that are not implemented in SQLite or even worst they are implemented but they work differently.

Don’t mock every database call.

Mocking is great for testing small units but sometimes you need to write tests that test the whole picture. Although hitting the database is slow you need to do it to ensure that your codes work with the database correctly.

Yes the Django ORM is tested and it works fine but are you using it correctly?

Resources

Share This:
There are 0 comments

Leave a Reply

Your email address will not be published. Required fields are marked *