Django filter – filter(A).filter(B) vs filter(A, B)

In this article, we are exploring the behavior of Django ORM when it comes to filter chaining & reverse foreign key relations.

Intro

We’re using Django ORM every day in our projects. it’s a really powerful tool with quite intuitive interface.

One of the most common cases is making a SELECT query with a WHERE clause, using the filter() method.

But there’s one tricky exception that we all hit at some point:
Sometimes, when chaining filters together –  SomeModels.objects.filter(a=A).filter(b=B) we get duplicate objects in our queryset.

In this article, we’ll explore & explain why.

Common case

There’s three common ways to use the filter method:

With Q expressions

Author.objects.filter(Q(name='Mihail') & Q(age=20))
SELECT "authors"."id",
       "authors"."name",
       "authors"."age"
FROM   "authors"
WHERE  (
    "authors"."name" = mihail AND "authors"."age" = 20
)

With **kwargs

Author.objects.filter(name='Mihail', age=20)
SELECT "authors"."id",
       "authors"."name",
       "authors"."age"
FROM   "authors"
WHERE  (
    "authors"."name" = mihail AND "authors"."age" = 20
)

By chaining

Author.objects.filter(name='Mihail').filter(age=20)
SELECT "authors"."id",
       "authors"."name",
       "authors"."age"
FROM   "authors"
WHERE  ( 
    "authors"."name" = mihail AND "authors"."age" = 20 
)

When dealing with own fields or ForeignKeys these 3 ways are absolutely equivalent.

Filtering Reverse ForeinKeys

Let’s create 2 simple models

class Author(Model):
    name = CharField(max_length=255)
    age = PositiveIntegerField()

class Article(Model):
    category = CharField(max_length=255)
    title = CharField(max_length=255)
    author = ForeignKey(Author, related_name='articles', on_delete=CASCADE)

And we have following objects in the database:

Author table

id name age
1 Bob 24

Article table

id title category author_id
1 Django filters Tech 1
2 Django models Other 1

Now let’s try the filter method with the three different interfaces:

Using Q expressions

Author.objects.filter(
    Q(articles__category='Tech') &
    Q(articles__title__startswith='Django')
)

>>> <QuerySet [<Author id=1 >]>
SELECT "authors"."id",
      "authors"."name",
      "authors"."age"
 FROM "authors"
INNER JOIN "articles"
   ON ("authors"."id" = "articles"."author_id")
WHERE ("articles"."category" = 'Tech' AND "articles"."title" LIKE 'Django%' ESCAPE '\')
LIMIT 21

Using filter with kwargs

Author.objects.filter(
    articles__category='Tech',
    articles__title__startswith='Django'
)

>>> QuerySet [<Author id=1 >]>
SELECT "authors"."id",
       "authors"."name",
       "authors"."age"
  FROM "authors"
 INNER JOIN "articles"
    ON ("authors"."id" = "articles"."author_id")
 WHERE ("articles"."category" = 'Tech' AND "articles"."title" LIKE 'Django%' ESCAPE '\')
 LIMIT 21

Using filter chains

Author.objects\
      .filter(articles__category='Tech')\
      .filter(articles__title__startswith='Django')

>>> <QuerySet [<Author id=1 >, <Author id=1 >]>
SELECT "authors"."id",
       "authors"."name",
       "authors"."age"
  FROM "authors"
 INNER JOIN "articles"
    ON ("authors"."id" = "articles"."author_id")
 INNER JOIN "articles" T3
    ON ("authors"."id" = T3."author_id")
 WHERE (
   "articles"."category" = 'Tech' AND
   T3."title" LIKE 'Django%' ESCAPE '\'
 )
 LIMIT 21

As we can see, the last queryset produced a different SQL query. Let’s figure out why.

Quick explanation

One of the most powerful features of the Django ORM is the automatic SQL JOIN handling.

Let’s construct the last query again step by step without adding the WHERE clause and see what’s happening:

Author.objects.filter(articles__category=='Tech')
SELECT "authors"."id",
       "authors"."name",
       "authors"."age",
       "articles"."id" as "T1 id",
       "articles"."category" as "T1 category",
       "articles"."author_id" as "T1 author",
       "articles"."title" as "T1 title"
FROM "authors"
  INNER JOIN "articles"
    ON ("authors"."id" = "articles"."author_id")
-- WHERE "articles"."category" = 'Tech';

Adding the first filter results in an INNER JOIN, since we’re looking up by a reverse foreign key.

id name age T1 id T1 category T1 author T1 title
1 Bob 24 2 Tech 1 Django filters
1 Bob 24 3 Other 1 Django models

Here’s how the joined SQL table looks like

Author.objects.filter(articles__category='Tech').filter(articles__title__startswith='Django')
SELECT "authors"."id",
       "authors"."name",
       "authors"."age",
       "T3"."id" as "T1 id",
       "T3"."category" as "T1 category",
       "T3"."author_id" as "T1 author",
       "T3"."title" as "T1 title",
       "articles"."id" as "T2 id",
       "articles"."category" as "T2 category",
       "articles"."author_id" as "T2 author",
       "articles"."title" as "T2 title"
FROM "authors"
  INNER JOIN "articles"
    ON ("authors"."id" = "articles"."author_id")
  INNER JOIN "articles" T3
    ON ("authors"."id" = T3."author_id")
-- WHERE (
--  "articles"."category" = 'Tech' AND
--  T3."title" LIKE 'Django%' ESCAPE '\'
--);
id name age T1
id
T1
category
T1
author
T1
title
T2
id
T2
category
T2
author
T2
title
1 Bob 24 2 Tech 1 Django filters 2 Tech 1 Django

filters

1 Bob 24 3 Other 1 Django models 2 Tech 1 Django filters
1 Bob 24 2 Tech 1 Django filters 3 Other 1 Django models
1 Bob 24 3 Other 1 Django models 3 Other 1 Django models

The new filter means a new INNER JOIN and that’s the “culprit” because inner-joining a reverse foreign key table again it causes results duplicates.

That behavior is caused by the SQL DB rather than Django ORM.

…and finally we apply the WHERE clause that will match all the duplicated rows and we’ll end up with the QuerySet with duplicated objects inside.

Sample solution

You can distinct() method to remove the duplicates:

Author.objects\
      .filter(articles__category='Tech')\
      .filter(articles__title__startswith='Django')\
      .distinct()

But be careful with distinct() if you make explicit ordering on the same queryset as documented.

Lookup difference

There’s one important lookup difference when you search through reverse ForeignKeys.

Using filter with Q expressions or kwargs

Author.objects.filter(
    Q(articles__category='Tech') &
    Q(articles__title__startswith='Django')
)

# or

Author.objects.filter(
    articles__category='Tech',
    articles__title__startswith='Django'
)

Both of these querysets will search for all authors that have articles with both ‘Tech’ category and title starting with ‘Django’.

Using filter chains

Author.objects\ 
      .filter(articles__category='Tech')\
      .filter(articles__title__startswith='Django')

This queryset will search for all authors that have an article from ‘Tech’ category as well as an article with title starting with ‘Django’.

This behaviour is well documented in Django documenation.

Summary

When you’re working with own fields or ForeignKey relations:

  • .filter(A, B) means WHERE A AND B
  • .filter(Q(A) & Q(B)) means WHERE A AND B
  • .filter(A).filter(B) means WHERE A AND B

But when you’re working with reverse ForeignKeys:

  • .filter(A, B) means WHERE A AND B
  • .filter(Q(A) & Q(B)) means WHERE A AND B
  • .filter(A).filter(B) still means WHERE A AND B but A and B works on a table with duplicates caused by the multiple INNER JOINs of the same table

NOTE: This behavior is well known in the django tickets:

I hope this saves your hours of debugging, especially when the filtering is usually done by a popular library like django-filter. 🙂

Share This:
There are 0 comments

Leave a Reply

Your email address will not be published. Required fields are marked *