Need help with your Django project?
Check our django servicesIntro
We’re using Django ORM every day in our projects. It’s a powerful tool with a quite intuitive interface.
One of the most common cases is making a SELECT query with a WHERE clause, using the filter() method.
But there’s one tricky exception that we all hit at some point:
Sometimes, when chaining filters together – SomeModels.objects.filter(a=A).filter(b=B)
– we get duplicate objects in our queryset.
In this article, we’ll explore & explain why.
Common case
There are three common ways to use the filter method:
With Q expressions
Author.objects.filter(Q(name='Mihail') & Q(age=20))
SELECT "authors"."id",
"authors"."name",
"authors"."age"
FROM "authors"
WHERE (
"authors"."name" = mihail AND "authors"."age" = 20
)
With **kwargs
Author.objects.filter(name='Mihail', age=20)
SELECT "authors"."id",
"authors"."name",
"authors"."age"
FROM "authors"
WHERE (
"authors"."name" = mihail AND "authors"."age" = 20
)
By chaining
Author.objects.filter(name='Mihail').filter(age=20)
SELECT "authors"."id",
"authors"."name",
"authors"."age"
FROM "authors"
WHERE (
"authors"."name" = mihail AND "authors"."age" = 20
)
When dealing with own fields or ForeignKeys these 3 ways are absolutely equivalent.
Filtering Reverse ForeinKeys
Let’s create 2 simple models
class Author(Model):
name = CharField(max_length=255)
age = PositiveIntegerField()
class Article(Model):
category = CharField(max_length=255)
title = CharField(max_length=255)
author = ForeignKey(Author, related_name='articles', on_delete=CASCADE)
And we have the following objects in the database:
Author table
id | namne | age |
---|---|---|
1 | Bob | 24 |
Article table
id | title | category | author_id |
---|---|---|---|
1 | Django filters | Tech | 1 |
2 | Django models | Other | 1 |
Now let’s try the filter method with the three different interfaces:
Using Q expressions
Author.objects.filter(
Q(articles__category='Tech') &
Q(articles__title__startswith='Django')
)
>>> <QuerySet [<Author id=1 >]>
SELECT "authors"."id",
"authors"."name",
"authors"."age"
FROM "authors"
INNER JOIN "articles"
ON ("authors"."id" = "articles"."author_id")
WHERE ("articles"."category" = 'Tech' AND "articles"."title" LIKE 'Django%' ESCAPE '\')
LIMIT 21
Using a filter with kwargs
Author.objects.filter(
articles__category='Tech',
articles__title__startswith='Django'
)
>>> QuerySet [<Author id=1 >]>
SELECT "authors"."id",
"authors"."name",
"authors"."age"
FROM "authors"
INNER JOIN "articles"
ON ("authors"."id" = "articles"."author_id")
WHERE ("articles"."category" = 'Tech' AND "articles"."title" LIKE 'Django%' ESCAPE '\')
LIMIT 21
Using filter chains
Author.objects\
.filter(articles__category='Tech')\
.filter(articles__title__startswith='Django')
>>> <QuerySet [<Author id=1 >, <Author id=1 >]>
SELECT "authors"."id",
"authors"."name",
"authors"."age"
FROM "authors"
INNER JOIN "articles"
ON ("authors"."id" = "articles"."author_id")
INNER JOIN "articles" T3
ON ("authors"."id" = T3."author_id")
WHERE (
"articles"."category" = 'Tech' AND
T3."title" LIKE 'Django%' ESCAPE '\'
)
LIMIT 21
As we can see, the last queryset produced a different SQL query. Let’s figure out why.
Quick explanation
One of the most powerful features of the Django ORM is the automatic SQL JOIN handling.
Let’s construct the last query again step by step without adding the WHERE clause and see what’s happening:
Author.objects.filter(articles__category=='Tech')
SELECT "authors"."id",
"authors"."name",
"authors"."age",
"articles"."id" as "T1 id",
"articles"."category" as "T1 category",
"articles"."author_id" as "T1 author",
"articles"."title" as "T1 title"
FROM "authors"
INNER JOIN "articles"
ON ("authors"."id" = "articles"."author_id")
-- WHERE "articles"."category" = 'Tech';
Adding the first filter results in an INNER JOIN, since we’re looking up by a reverse foreign key.
id | name | age | T1 id | T1 category | T1 author | T1 title |
---|---|---|---|---|---|---|
1 | Bob | 24 | 2 | Tech | 1 | Django filters |
1 | Bob | 24 | 3 | Other | 1 | Django models |
Here’s how the joined SQL table looks like
Author.objects.filter(articles__category='Tech').filter(articles__title__startswith='Django')
SELECT "authors"."id",
"authors"."name",
"authors"."age",
"T3"."id" as "T1 id",
"T3"."category" as "T1 category",
"T3"."author_id" as "T1 author",
"T3"."title" as "T1 title",
"articles"."id" as "T2 id",
"articles"."category" as "T2 category",
"articles"."author_id" as "T2 author",
"articles"."title" as "T2 title"
FROM "authors"
INNER JOIN "articles"
ON ("authors"."id" = "articles"."author_id")
INNER JOIN "articles" T3
ON ("authors"."id" = T3."author_id")
-- WHERE (
-- "articles"."category" = 'Tech' AND
-- T3."title" LIKE 'Django%' ESCAPE '\'
--);
id | name | age | T1id | T1category | T1author | T1title | T2id | T2category | T2author | T2title |
---|---|---|---|---|---|---|---|---|---|---|
1 | Bob | 24 | 2 | Tech | 1 | Django filters | 2 | Tech | 1 | Djangofilters |
1 | Bob | 24 | 3 | Other | 1 | Django models | 2 | Tech | 1 | Django filters |
1 | Bob | 24 | 2 | Tech | 1 | Django filters | 3 | Other | 1 | Django models |
1 | Bob | 24 | 3 | Other | 1 | Django models | 3 | Other | 1 | Django models |
The new filter means a new INNER JOIN and that’s the “culprit” because inner-joining a reverse foreign key table again it causes results duplicates.
That behavior is caused by the SQL DB rather than Django ORM.
…and finally, we apply the WHERE clause that will match all the duplicated rows and we’ll end up with the QuerySet with duplicated objects inside.
Sample solution
You can distinct() method to remove the duplicates:
Author.objects\
.filter(articles__category='Tech')\
.filter(articles__title__startswith='Django')\
.distinct()
But be careful with distinct() if you make explicit ordering on the same queryset as documented.
Lookup difference
There’s one important lookup difference when you search through reverse ForeignKeys.
Using a filter with Q expressions or kwargs
Author.objects.filter(
Q(articles__category='Tech') &
Q(articles__title__startswith='Django')
)
# or
Author.objects.filter(
articles__category='Tech',
articles__title__startswith='Django'
)
Both of these querysets will search for all authors that have articles with both ‘Tech’ category and title starting with ‘Django’.
Using filter chains
Author.objects\
.filter(articles__category='Tech')\
.filter(articles__title__startswith='Django')
This queryset will search for all authors that have an article from ‘Tech’ category as well as an article with title starting with ‘Django’.
This behaviour is well documented in Django documenation.
Summary
When you’re working with own fields or ForeignKey relations:
.filter(A, B)
means WHERE A AND B.filter(Q(A) & Q(B))
means WHERE A AND B.filter(A).filter(B)
means WHERE A AND B
But when you’re working with reverse ForeignKeys:
.filter(A, B)
means WHERE A AND B.filter(Q(A) & Q(B))
means WHERE A AND B.filter(A).filter(B)
still means WHERE A AND B but A and B works on a table with duplicates caused by the multiple INNER JOINs of the same table
NOTE: This behavior is well known in the Django tickets:
I hope this saves your hours of debugging, especially when the filtering is usually done by a popular library like django-filter. 🙂