What is the "N+1 selects problem" in ORM (Object-Relational Mapping)?

ID : 481

viewed : 104

Tags : ormselect-n-plus-1orm





Top 5 Answer for What is the "N+1 selects problem" in ORM (Object-Relational Mapping)?

vote vote

94

Let's say you have a collection of Car objects (database rows), and each Car has a collection of Wheel objects (also rows). In other words, CarWheel is a 1-to-many relationship.

Now, let's say you need to iterate through all the cars, and for each one, print out a list of the wheels. The naive O/R implementation would do the following:

SELECT * FROM Cars; 

And then for each Car:

SELECT * FROM Wheel WHERE CarId = ? 

In other words, you have one select for the Cars, and then N additional selects, where N is the total number of cars.

Alternatively, one could get all wheels and perform the lookups in memory:

SELECT * FROM Wheel 

This reduces the number of round-trips to the database from N+1 to 2. Most ORM tools give you several ways to prevent N+1 selects.

Reference: Java Persistence with Hibernate, chapter 13.

vote vote

88

What is the N+1 query problem

The N+1 query problem happens when the data access framework executed N additional SQL statements to fetch the same data that could have been retrieved when executing the primary SQL query.

The larger the value of N, the more queries will be executed, the larger the performance impact. And, unlike the slow query log that can help you find slow running queries, the N+1 issue won’t be spot because each individual additional query runs sufficiently fast to not trigger the slow query log.

The problem is executing a large number of additional queries that, overall, take sufficient time to slow down response time.

Let’s consider we have the following post and post_comments database tables which form a one-to-many table relationship:

The post and post_comments tables

We are going to create the following 4 post rows:

INSERT INTO post (title, id) VALUES ('High-Performance Java Persistence - Part 1', 1)   INSERT INTO post (title, id) VALUES ('High-Performance Java Persistence - Part 2', 2)   INSERT INTO post (title, id) VALUES ('High-Performance Java Persistence - Part 3', 3)   INSERT INTO post (title, id) VALUES ('High-Performance Java Persistence - Part 4', 4) 

And, we will also create 4 post_comment child records:

INSERT INTO post_comment (post_id, review, id) VALUES (1, 'Excellent book to understand Java Persistence', 1)   INSERT INTO post_comment (post_id, review, id) VALUES (2, 'Must-read for Java developers', 2)   INSERT INTO post_comment (post_id, review, id) VALUES (3, 'Five Stars', 3)   INSERT INTO post_comment (post_id, review, id) VALUES (4, 'A great reference book', 4) 

N+1 query problem with plain SQL

If you select the post_comments using this SQL query:

List<Tuple> comments = entityManager.createNativeQuery("""     SELECT         pc.id AS id,         pc.review AS review,         pc.post_id AS postId     FROM post_comment pc     """, Tuple.class) .getResultList(); 

And, later, you decide to fetch the associated post title for each post_comment:

for (Tuple comment : comments) {     String review = (String) comment.get("review");     Long postId = ((Number) comment.get("postId")).longValue();       String postTitle = (String) entityManager.createNativeQuery("""         SELECT             p.title         FROM post p         WHERE p.id = :postId         """)     .setParameter("postId", postId)     .getSingleResult();       LOGGER.info(         "The Post '{}' got this review '{}'",         postTitle,         review     ); } 

You are going to trigger the N+1 query issue because, instead of one SQL query, you executed 5 (1 + 4):

SELECT     pc.id AS id,     pc.review AS review,     pc.post_id AS postId FROM post_comment pc   SELECT p.title FROM post p WHERE p.id = 1 -- The Post 'High-Performance Java Persistence - Part 1' got this review -- 'Excellent book to understand Java Persistence'      SELECT p.title FROM post p WHERE p.id = 2 -- The Post 'High-Performance Java Persistence - Part 2' got this review -- 'Must-read for Java developers'       SELECT p.title FROM post p WHERE p.id = 3 -- The Post 'High-Performance Java Persistence - Part 3' got this review -- 'Five Stars'       SELECT p.title FROM post p WHERE p.id = 4 -- The Post 'High-Performance Java Persistence - Part 4' got this review -- 'A great reference book' 

Fixing the N+1 query issue is very easy. All you need to do is extract all the data you need in the original SQL query, like this:

List<Tuple> comments = entityManager.createNativeQuery("""     SELECT         pc.id AS id,         pc.review AS review,         p.title AS postTitle     FROM post_comment pc     JOIN post p ON pc.post_id = p.id     """, Tuple.class) .getResultList();   for (Tuple comment : comments) {     String review = (String) comment.get("review");     String postTitle = (String) comment.get("postTitle");       LOGGER.info(         "The Post '{}' got this review '{}'",         postTitle,         review     ); } 

This time, only one SQL query is executed to fetch all the data we are further interested in using.

N+1 query problem with JPA and Hibernate

When using JPA and Hibernate, there are several ways you can trigger the N+1 query issue, so it’s very important to know how you can avoid these situations.

For the next examples, consider we are mapping the post and post_comments tables to the following entities:

Post and PostComment entities

The JPA mappings look like this:

@Entity(name = "Post") @Table(name = "post") public class Post {       @Id     private Long id;       private String title;       //Getters and setters omitted for brevity }   @Entity(name = "PostComment") @Table(name = "post_comment") public class PostComment {       @Id     private Long id;       @ManyToOne     private Post post;       private String review;       //Getters and setters omitted for brevity } 

FetchType.EAGER

Using FetchType.EAGER either implicitly or explicitly for your JPA associations is a bad idea because you are going to fetch way more data that you need. More, the FetchType.EAGER strategy is also prone to N+1 query issues.

Unfortunately, the @ManyToOne and @OneToOne associations use FetchType.EAGER by default, so if your mappings look like this:

@ManyToOne private Post post; 

You are using the FetchType.EAGER strategy, and, every time you forget to use JOIN FETCH when loading some PostComment entities with a JPQL or Criteria API query:

List<PostComment> comments = entityManager .createQuery("""     select pc     from PostComment pc     """, PostComment.class) .getResultList(); 

You are going to trigger the N+1 query issue:

SELECT      pc.id AS id1_1_,      pc.post_id AS post_id3_1_,      pc.review AS review2_1_  FROM      post_comment pc  SELECT p.id AS id1_0_0_, p.title AS title2_0_0_ FROM post p WHERE p.id = 1 SELECT p.id AS id1_0_0_, p.title AS title2_0_0_ FROM post p WHERE p.id = 2 SELECT p.id AS id1_0_0_, p.title AS title2_0_0_ FROM post p WHERE p.id = 3 SELECT p.id AS id1_0_0_, p.title AS title2_0_0_ FROM post p WHERE p.id = 4 

Notice the additional SELECT statements that are executed because the post association has to be fetched prior to returning the List of PostComment entities.

Unlike the default fetch plan, which you are using when calling the find method of the EnrityManager, a JPQL or Criteria API query defines an explicit plan that Hibernate cannot change by injecting a JOIN FETCH automatically. So, you need to do it manually.

If you didn't need the post association at all, you are out of luck when using FetchType.EAGER because there is no way to avoid fetching it. That's why it's better to use FetchType.LAZY by default.

But, if you wanted to use post association, then you can use JOIN FETCH to avoid the N+1 query problem:

List<PostComment> comments = entityManager.createQuery("""     select pc     from PostComment pc     join fetch pc.post p     """, PostComment.class) .getResultList();  for(PostComment comment : comments) {     LOGGER.info(         "The Post '{}' got this review '{}'",          comment.getPost().getTitle(),          comment.getReview()     ); } 

This time, Hibernate will execute a single SQL statement:

SELECT      pc.id as id1_1_0_,      pc.post_id as post_id3_1_0_,      pc.review as review2_1_0_,      p.id as id1_0_1_,      p.title as title2_0_1_  FROM      post_comment pc  INNER JOIN      post p ON pc.post_id = p.id      -- The Post 'High-Performance Java Persistence - Part 1' got this review  -- 'Excellent book to understand Java Persistence'  -- The Post 'High-Performance Java Persistence - Part 2' got this review  -- 'Must-read for Java developers'  -- The Post 'High-Performance Java Persistence - Part 3' got this review  -- 'Five Stars'  -- The Post 'High-Performance Java Persistence - Part 4' got this review  -- 'A great reference book' 

FetchType.LAZY

Even if you switch to using FetchType.LAZY explicitly for all associations, you can still bump into the N+1 issue.

This time, the post association is mapped like this:

@ManyToOne(fetch = FetchType.LAZY) private Post post; 

Now, when you fetch the PostComment entities:

List<PostComment> comments = entityManager .createQuery("""     select pc     from PostComment pc     """, PostComment.class) .getResultList(); 

Hibernate will execute a single SQL statement:

SELECT      pc.id AS id1_1_,      pc.post_id AS post_id3_1_,      pc.review AS review2_1_  FROM      post_comment pc 

But, if afterward, you are going to reference the lazy-loaded post association:

for(PostComment comment : comments) {     LOGGER.info(         "The Post '{}' got this review '{}'",          comment.getPost().getTitle(),          comment.getReview()     ); } 

You will get the N+1 query issue:

SELECT p.id AS id1_0_0_, p.title AS title2_0_0_ FROM post p WHERE p.id = 1 -- The Post 'High-Performance Java Persistence - Part 1' got this review  -- 'Excellent book to understand Java Persistence'  SELECT p.id AS id1_0_0_, p.title AS title2_0_0_ FROM post p WHERE p.id = 2 -- The Post 'High-Performance Java Persistence - Part 2' got this review  -- 'Must-read for Java developers'  SELECT p.id AS id1_0_0_, p.title AS title2_0_0_ FROM post p WHERE p.id = 3 -- The Post 'High-Performance Java Persistence - Part 3' got this review  -- 'Five Stars'  SELECT p.id AS id1_0_0_, p.title AS title2_0_0_ FROM post p WHERE p.id = 4 -- The Post 'High-Performance Java Persistence - Part 4' got this review  -- 'A great reference book' 

Because the post association is fetched lazily, a secondary SQL statement will be executed when accessing the lazy association in order to build the log message.

Again, the fix consists in adding a JOIN FETCH clause to the JPQL query:

List<PostComment> comments = entityManager.createQuery("""     select pc     from PostComment pc     join fetch pc.post p     """, PostComment.class) .getResultList();  for(PostComment comment : comments) {     LOGGER.info(         "The Post '{}' got this review '{}'",          comment.getPost().getTitle(),          comment.getReview()     ); } 

And, just like in the FetchType.EAGER example, this JPQL query will generate a single SQL statement.

Even if you are using FetchType.LAZY and don't reference the child association of a bidirectional @OneToOne JPA relationship, you can still trigger the N+1 query issue.

How to automatically detect the N+1 query issue

If you want to automatically detect N+1 query issue in your data access layer, you can use the db-util open-source project.

First, you need to add the following Maven dependency:

<dependency>     <groupId>com.vladmihalcea</groupId>     <artifactId>db-util</artifactId>     <version>${db-util.version}</version> </dependency> 

Afterward, you just have to use SQLStatementCountValidator utility to assert the underlying SQL statements that get generated:

SQLStatementCountValidator.reset();  List<PostComment> comments = entityManager.createQuery("""     select pc     from PostComment pc     """, PostComment.class) .getResultList();  SQLStatementCountValidator.assertSelectCount(1); 

In case you are using FetchType.EAGER and run the above test case, you will get the following test case failure:

SELECT      pc.id as id1_1_,      pc.post_id as post_id3_1_,      pc.review as review2_1_  FROM      post_comment pc  SELECT p.id as id1_0_0_, p.title as title2_0_0_ FROM post p WHERE p.id = 1  SELECT p.id as id1_0_0_, p.title as title2_0_0_ FROM post p WHERE p.id = 2   -- SQLStatementCountMismatchException: Expected 1 statement(s) but recorded 3 instead! 
vote vote

77

SELECT  table1.* , table2.* INNER JOIN table2 ON table2.SomeFkId = table1.SomeId 

That gets you a result set where child rows in table2 cause duplication by returning the table1 results for each child row in table2. O/R mappers should differentiate table1 instances based on a unique key field, then use all the table2 columns to populate child instances.

SELECT table1.*  SELECT table2.* WHERE SomeFkId = # 

The N+1 is where the first query populates the primary object and the second query populates all the child objects for each of the unique primary objects returned.

Consider:

class House {     int Id { get; set; }     string Address { get; set; }     Person[] Inhabitants { get; set; } }  class Person {     string Name { get; set; }     int HouseId { get; set; } } 

and tables with a similar structure. A single query for the address "22 Valley St" may return:

Id Address      Name HouseId 1  22 Valley St Dave 1 1  22 Valley St John 1 1  22 Valley St Mike 1 

The O/RM should fill an instance of Home with ID=1, Address="22 Valley St" and then populate the Inhabitants array with People instances for Dave, John, and Mike with just one query.

A N+1 query for the same address used above would result in:

Id Address 1  22 Valley St 

with a separate query like

SELECT * FROM Person WHERE HouseId = 1 

and resulting in a separate data set like

Name    HouseId Dave    1 John    1 Mike    1 

and the final result being the same as above with the single query.

The advantages to single select is that you get all the data up front which may be what you ultimately desire. The advantages to N+1 is query complexity is reduced and you can use lazy loading where the child result sets are only loaded upon first request.

vote vote

63

Supplier with a one-to-many relationship with Product. One Supplier has (supplies) many Products.

***** Table: Supplier ***** +-----+-------------------+ | ID  |       NAME        | +-----+-------------------+ |  1  |  Supplier Name 1  | |  2  |  Supplier Name 2  | |  3  |  Supplier Name 3  | |  4  |  Supplier Name 4  | +-----+-------------------+  ***** Table: Product ***** +-----+-----------+--------------------+-------+------------+ | ID  |   NAME    |     DESCRIPTION    | PRICE | SUPPLIERID | +-----+-----------+--------------------+-------+------------+ |1    | Product 1 | Name for Product 1 |  2.0  |     1      | |2    | Product 2 | Name for Product 2 | 22.0  |     1      | |3    | Product 3 | Name for Product 3 | 30.0  |     2      | |4    | Product 4 | Name for Product 4 |  7.0  |     3      | +-----+-----------+--------------------+-------+------------+ 

Factors:

  • Lazy mode for Supplier set to “true” (default)

  • Fetch mode used for querying on Product is Select

  • Fetch mode (default): Supplier information is accessed

  • Caching does not play a role for the first time the

  • Supplier is accessed

Fetch mode is Select Fetch (default)

// It takes Select fetch mode as a default Query query = session.createQuery( "from Product p"); List list = query.list(); // Supplier is being accessed displayProductsListWithSupplierName(results);  select ... various field names ... from PRODUCT select ... various field names ... from SUPPLIER where SUPPLIER.id=? select ... various field names ... from SUPPLIER where SUPPLIER.id=? select ... various field names ... from SUPPLIER where SUPPLIER.id=? 

Result:

  • 1 select statement for Product
  • N select statements for Supplier

This is N+1 select problem!

vote vote

54

I can't comment directly on other answers, because I don't have enough reputation. But it's worth noting that the problem essentially only arises because, historically, a lot of dbms have been quite poor when it comes to handling joins (MySQL being a particularly noteworthy example). So n+1 has, often, been notably faster than a join. And then there are ways to improve on n+1 but still without needing a join, which is what the original problem relates to.

However, MySQL is now a lot better than it used to be when it comes to joins. When I first learned MySQL, I used joins a lot. Then I discovered how slow they are, and switched to n+1 in the code instead. But, recently, I've been moving back to joins, because MySQL is now a heck of a lot better at handling them than it was when I first started using it.

These days, a simple join on a properly indexed set of tables is rarely a problem, in performance terms. And if it does give a performance hit, then the use of index hints often solves them.

This is discussed here by one of the MySQL development team:

http://jorgenloland.blogspot.co.uk/2013/02/dbt-3-q3-6-x-performance-in-mysql-5610.html

So the summary is: If you've been avoiding joins in the past because of MySQL's abysmal performance with them, then try again on the latest versions. You'll probably be pleasantly surprised.

Top 3 video Explaining What is the "N+1 selects problem" in ORM (Object-Relational Mapping)?







Related QUESTION?