简体   繁体   中英

Efficient way to query records greater than 10k in JPA

I need to query a table based on certain input parameters. Due to constant increase in the number of records findby method is gradually taking more time to execute which is causing an exception during runtime.

What is the best way to fine tune and query huge records in a shot? Any help would be appreciated.

Repo:

   @Query("SELECT ap FROM AP ap WHERE ap.workCentre = :workCentre "
        + "AND ap.lastUpdated>:lastUpdated "
        + "AND ap.stagingIndicator>1 "
        + "AND (ap.productCode='XYA' OR ap.productCode='ABC')")
List<AP> findAPByStationAndLastUpdated(@Param("wc") Integer wc, @Param("lastUpdated") Date lastUpdated);

Entity:

public class AP implements Serializable {

    @Id
    private String id;
    

    @OneToOne(optional = false,  fetch = FetchType.LAZY, cascade = CascadeType.ALL, orphanRemoval = true)
    @NotFound(action = NotFoundAction.IGNORE)
    @JoinColumn(name = "sk", referencedColumnName = "sk")
    private AStop aStop;
}

With large data, processing it as a stream is the best, and usually only, way.

Instead of returning List<> , return Stream<> :

@Query("SELECT ap FROM AP ap WHERE ap.workCentre = :workCentre "
        + "AND ap.lastUpdated>:lastUpdated "
        + "AND ap.stagingIndicator>1 "
        + "AND (ap.productCode='XYA' OR ap.productCode='ABC')")
Stream<AP> findAPByStationAndLastUpdated(@Param("wc") Integer wc, @Param("lastUpdated") Date lastUpdated);

Example usage:

findAPByStationAndLastUpdated(someInt, someDate).forEach(this:processAp);

with:

private void processAp(AP ap) {
    // do something with ap
}

As of now it's taking 70-80s to query 10k records

Unless these records are huge (due to containing large CLOB or BLOB that are eagerly loaded, or referencing other entities that are eagerly loaded), that delay is not JPA, but the database.

You should therefore do the usual troubleshooting steps for slow queries: Check that you have created appropriate indices in the database, look at an explain plan to see what the database is doing, or have its management console suggest better indices.

Judging from the query, an index like

create index on AP(workcentre, productcode, lastupdated, stagingindicator, and, then, all, remaining, columns)

or, if the stagingindicator has few different values,

create index on AP(workcentre, productcode, stagingindicator, lastupdated, and, then, all, remaining, columns)

should be ideal and allow for the query to run in a fraction of a second (by allowing the database to satisfy the query with a couple of index range scans).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM