简体   繁体   中英

Why is my JPA query getting slower on each loop iteration?

I have the requirement of loading all the data from a db table and then get it into an index for search (elasticsearch to be specific). (Using an ES river is not an option in my case)

What I experience is the following: I have the query with a specific batch sice (eg 5000 entries). I execute that query in a loop to get the batches, increasing the offset with every iteration. First iteration takes like 19 seconds. 4th iteration is at something like 50 seconds already.

That column has 7 million rows in my case, but production data would be more by a factor of 3 at least, so if that execution time keeps growing, I won't get anywhere with my approach (at 7 million entries already). Later on I can use multiple threads for selecting the data for sure, but first of all I want to keep the time per select constant (if possible).

I'm wondering where that loss in performance is coming from and how to avoid or at least minimize it?

The table I'm selecting from just has an id (long) and a document (clob) column.

I'm using a h2 with that 7 million rows in it, maybe that's the reason? I'm not familiar with the performance of h2 on such a table size.

My first guess was the Garbage Collector, so I had a look with VisualVM ... seemed ok though. Already tried clearing all caches with in the session factory on each iteration, no change in behavior though, so I guess I'm on the wrong track here.

EntityManager em = persistenceUtils.openEm();
//        em.setProperty("javax.persistence.cache.storeMode",      CacheStoreMode.BYPASS);
//        em.setProperty("javax.persistence.cache.retrieveMode",      CacheRetrieveMode.BYPASS);
    Query selectAll = em.createQuery("Select d from Document d order by d.id");

    List<Document> documents = selectAll.setFirstResult(0).setMaxResults(BATCH_SIZE).getResultList();
    List<ListenableActionFuture<BulkResponse>> bulkResponses = Lists.newArrayList(addBulkIndexRequests(documents, client));
    int i = 1;
    while(documents != null && !documents.isEmpty()) {
        long batchStartTime = System.nanoTime();
        documents = selectAll.setFirstResult(i*BATCH_SIZE).setMaxResults(BATCH_SIZE).getResultList();
        long batchEndTime = System.nanoTime();
        System.out.println("+++ SELECTED BATCH " + i + "in" + (batchEndTime - batchStartTime) / 1000000000.0 +  "SECONDS +++");
        addBulkIndexRequests(documents, client);
        System.out.println("+++ ADDED BATCH " + i + " +++");
        i++;
    }
    persistenceUtils.closeEm(em);

Seems H2 was the problem here. Installed an oracle 11g locally and ran the select queries against it. Constantly having around 0,44s access time for each batch of 1000 entries.

But it has to be said, that I also implemented the suggestions from Predrag marcic and Andrei in the solution i tested against the oracle DB.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM