简体   繁体   中英

Spring JPA query always uses Sequence Scan instead of an Index Scan

I have a simple query

@Query(value = "select * from some_table where consumer_id=:consumerId and store_id=:storeId and cancelled_at is null", nativeQuery = true)
fun checkIfNewConsumer(consumerId: BigInteger, storeId: BigInteger): List<SomeClass?>

When I run the query with an explain against the table of over 30 million rows directly

Index Scan using select_index on some_table (cost=0.56..8.59 rows=1 width=86) (actual time=0.015..0.015 rows=0 loops=1) Index Cond: ((consumer_id = 1234) AND (store_id = 4) AND (cancelled_at IS NULL)) Planning time: 0.130 ms Execution time: 0.042 ms

When I run the same query via a request using spring boot:

{"Plan"=>{"Total Cost"=>1317517.92, "Relation Name"=>"some_table", "Parallel Aware"=>"?", "Filter"=>"?", "Alias"=>"some_table", "Node Type"=>"Seq Scan", "Plan Width"=>86, "Startup Cost"=>0.0, "Plan Rows"=>912}} Execution time: 9613 ms

The spring boot plan above is from new relic. As you can see it defaults to Seq scan for every query instead of an Index scan . I have vacuumed analyzed assuming it was the database (no dice), I have tried variations of the query, no dice. It always looks perfect in plsql, borks via spring.

Any advice would be highly appreciated.

Edit 2: potential solution

We found out that by disabling prepared statements add ?preferQueryMode=simple to your connection url: jdbc:postgresql://localhost:5432/postgres?preferQueryMode=simple got the query to use the index scan.

We need to understand the How? Why? and Why now?

Edit 1: tech stack

  • Spring boot 2.0M5
  • Kotlin
  • PostgreSQL 9.6.2

Edit: SOLUTION @Vlad Mihalcea

please don't use preferQueryMode=simple unless you are absolutely sure what it means. Apparently, your problem is described in https://gist.github.com/vlsi/df08cbef370b2e86a5c1 . I guess you have BigInt in the database and BigInteger in the Kotlin code. Can you use Long in Kotlin?

–Vladimir Sitnikov

Since PostgreSQL does not entail any Execution Plan cache and PreparedStatement(s) are actually emulated until reaching a given threshold of executions (eg 5), I think it's this is an index selectivity issue you are facing here.

If this query returns only a small amount of records, the database will use the index.

If this query will return a large number of records, the database will not use the index because the cost of random access page reads will be higher than the cost of a sequential scan.

So, it might be that you are using different sets of bind parameter values here.

  1. The ones you've given in pgsql console are highly selective, hence you get the Index Scan.
  2. The ones you send at runtime might be different, hence you get a Sequential Scan.

More, on pgsql, the Explain Plan will not account for networking overhead of sending all records to the JDBC driver. However, this is complementary to your problem, not the actual root cause.

Now, to be really sure of the actual execution plans, try enabling the auto_explain mode in PostgreSQL.

Or, you can write a test method that runs the query like this:

List<Object[]> executionPlanLines = doInJPA(entityManager -> {
    try(Stream<Object[]> postStream = entityManager
        .createNativeQuery(
            "EXPLAIN ANALYZE " +
            "select * from some_table where consumer_id=:consumerId and store_id=:storeId and cancelled_at is null ")
        .setParameter("consumerId", consumerId)
        .setParameter("storeId", storeId)
        .unwrap(Query.class)
        .stream()
    ) {
        return postStream.collect( Collectors.toList() );
    }
});

LOGGER.info( "Execution plan: {}",
             executionPlanLines
             .stream()
             .map( line -> (String) line[0] )
             .collect( Collectors.joining( "\n" ) )
);

This way, you are going to the see the actual execution plan running in production.

Please don't use preferQueryMode=simple unless you are absolutely sure what it means (eg it might be helpful to process logical replication stream).

Apparently your problem is described in https://gist.github.com/vlsi/df08cbef370b2e86a5c1 . I guess you have bigint in the database, and BigInteger in the Kotlin code. Can you use Long in Kotlin?

Just in case: bigint in PostgreSQL means int8 , so Long should be used in the application.

Alternative option is to add an explicit cast like the following: consumer_id=cast(:consumerId as bigint) and store_id=cast(:storeId as bigint) .

The problem is the same as "character column compared with numeric value", however, the difference here is a bit more subtle (int8 vs numeric)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM