Need help with your Django project?
Check our django servicesDjango related fields & the caveats of implicit SQL INNER JOINs
Django's ORM is one of the most well-designed features of the framework. It provides a wrapper around the database that allows developers to handle database operations with ease and predictability.
At least most of the time.
N.B. The code structure in this article follows HackSoft's Django style guide. Terminologies like "selector", which will be used throughout the text, are described in detail there. βοΈ
Model structure
For the purpose of this article, we'll use the following models:
class User(Model):
...
class School(Model):
name = CharField(max_length=256)
users = ManyToManyField(
User,
through='SchoolMember',
related_name='schools'
)
class SchoolMember(Model):
school = ForeignKey(
School,
on_delete=models.CASCADE,
related_name='members'
)
user = ForeignKey(
User,
on_delete=models.CASCADE,
related_name='school_memberships'
)
Or simply said:
- A
User
model. - A
School
model which has a many-to-many relation toUser
via theSchoolMember
through model. - A
User
is considered part of aSchool
if he/she has a respectiveSchoolMember
.
The requirement
With this model structure, we have a simple task - get all the schools for a given user.
One specific requirement is to compute the count of all memberships for each school in the result.
Imagine having the following setup:
User
x3 -User 1
,User 2
&User 3
School
x2 -School 1
&School 2
User 1
is aSchoolMember
in bothSchool 1
&School 2
User 2
is aSchoolMember
inSchool 1
User 3
is aSchoolMember
inSchool 2
The desired result when fetching all the schools for User 1
would be (as JSON):
[
{
"id": 1,
"name": "School 1",
"members_count": 2
},
{
"id": 2,
"name": "School 2",
"members_count": 2
}
]
The go-to approach
As we've been using Django for quite some time - this task looks straightforward. We will get all schools for the user and we will annotate the members_count
via the Count
aggregation function in Django.
We've implemented this simple selector:
def user_get_schools(*, user: User) -> QuerySet[School]:
return user.schools.annotate(members_count=Count('members', distinct=True))
We thought this will compute the desired result described in the requirement, but surprisingly - it was wrong. When we call the selector with User 1
, we'll get this result:
[
{
"id": 1,
"name": "School 1",
"members_count": 1 # <-- Incorrect β
},
{
"id": 2,
"name": "School 2",
"members_count": 1 # <-- Incorrect β
}
]
The returned schools are the correct ones, but the members_count
is equal to 1
. If we call the selector with another user - this value stays the same. We expect both schools to have members_count
equal to 2
, since we have two school members in each of them.
Inspecting the query
The easiest way to find what is yielding the wrong result is to inspect the raw SQL query. We've used django-debug-toolbar
for the job.
The SQL query for the request looks like this:
SELECT
"schools_school"."id",
"schools_school"."name",
COUNT(
DISTINCT "schools_schoolmember"."id"
) AS "members_count"
FROM
"schools_school"
INNER JOIN "schools_schoolmember" ON (
"schools_school"."id" = "schools_schoolmember"."school_id"
)
WHERE
"schools_schoolmember"."user_id" = 1
As we're accessing user.schools
via the related_name
of the field for the many-to-many relationship, Django performs an INNER JOIN
behind the scenes in order to compute the results.
Then as we try to aggregate the members_count
in the SELECT
- the results are stripped down only to the ones from the INNER JOIN
(the blue selection):
This is wrong because the INNER JOIN
selects records that have matching values in both tables. And since the SchoolMember
s are filtered behind the scenes (see the WHERE
clause in the SQL queryπ), the COUNT
aggregation "sees" only these results.
That's why we will constantly see the members_count
value to actually represent the current user's members count in a school and not the count of all members in a school, which is the requirement.
This is how Django works by default - it performs an implicit INNER JOIN
in order to return related entries. In most cases, this behavior is correct and you will get correct data.
In some happy place, if we just wanted to return the id
and the name
of the schools, without data from external tables - this blog post would've never existed.
But like everything in software - there are some caveats. π
It's not a bug, it's a feature.
The correct implementation
Once we know what is causing the problem, the solution comes easy.
We need to rework the selector a bit so that it does not perform an INNER JOIN
behind the scenes:
def user_get_schools(*, user: User) -> QuerySet[School]:
user_schools_ids = user.schools.values_list('id', flat=True)
return School.objects\
.filter(id__in=user_schools_ids)\
.annotate(members_count=Count('members', distinct=True))
Here we use the related name again (user.schools
), but this time we save the list of these IDs in a separate variable (user_schools_ids
). Then when constructing the result query, we use the School.objects
manager directly and filter the queryset with the id__in
lookup so that we still get the correct user schools.
Now the query looks like this:
SELECT
"schools_school"."id",
"schools_school"."name",
COUNT(
DISTINCT "schools_schoolmember"."id"
) AS "members_count"
FROM
"schools_school"
LEFT OUTER JOIN "schools_schoolmember" ON (
"schools_school"."id" = "schools_schoolmember"."school_id"
)
WHERE
(
"schools_school"."id" IN (
SELECT
U0."id"
FROM
"schools_school" U0
INNER JOIN "schools_schoolmember" U1 ON (U0."id" = U1."school_id")
WHERE
U1."user_id" = 1
)
And respectfully, the JSON result is now correct:
[
{
"id": 1,
"name": "School 1",
"members_count": 2 # <-- Correct β
},
{
"id": 2,
"name": "School 2",
"members_count": 2 # <-- Correct β
}
]
As you can see, Django now performs a LEFT OUTER JOIN
when computing the query, instead of an INNER JOIN
. If you look closely, it still performs an INNER JOIN
, but this time it is for the inner SELECT
query that we have in the WHERE
clause.
Visualized as a Venn diagram, it looks like this:
You can clearly see that we now have the full set of SchoolMembers
that we need while still filtering the Schools
, thanks to the LEFT OUTER JOIN
. π
I hope that this article will help you learn more about Django ORM's behind-the-scenes mechanics when using related sets. For more interesting Django blog posts - check out the rest of the Django-related articles in our blog. π