简体   繁体   中英

Hibernate (PostgreSQL) slow select query WHERE clause on foreign key compared to jdbc

I wrote an app that scrapes internet radio playlists then saves them to a database. To learn about hibernate I migrated the app to use hibernate, but I've seen large performance dropoffs when doing a SELECT ... WHERE lookup compared to other attempts. The same procedure (to fetch around 17,000 tracks grouped by which programme they were played on and who played them) took 150ms in my python sqlite prototype, and the initial java version using apache db utils which took about 250ms, compared to my (probably horrific) hibernate version which takes about 1100ms.

@Override
public DJAllProgrammes getAllProgrammesFromDJ(Collection<String> names) {
    DJAllProgrammes djAllProgrammes = new DJAllProgrammes();
    session.beginTransaction();
    List<Presenter> result = session.createQuery("from Presenter p WHERE p.presenter_name in :names", Presenter.class)
            .setParameterList("names", names)
            .getResultList();
    for (Presenter presenter : result) {
        int presenter_id = presenter.getPresenter_id();
        List<Programme> programmes = session
                .createQuery("from programme prog WHERE prog.presenter_origin_id = :pres_orig_id", Programme.class)
                .setParameter("pres_orig_id", presenter_id)
                .getResultList();
        for (Programme programme : programmes) {
            //this is the critical performance death zone 
            List<Track> tracksOnThisProgramme = session
                    .createQuery("FROM track t WHERE t.programme.programme_id in :progIds", Track.class)
                    .setParameter("progIds", programme.getProgramme_id())
                    .getResultList();
            djAllProgrammes.addProgramme(new ProgrammeData(presenter.getPresenter_name(), programme.getDate(), tracksOnThisProgramme));
        }
    }
    session.getTransaction().commit();
    return djAllProgrammes;
}

Debug info:

INFO: Session Metrics

{
    33339 nanoseconds spent acquiring 1 JDBC connections;
    71991 nanoseconds spent releasing 1 JDBC connections;
    12938819 nanoseconds spent preparing 258 JDBC statements;
    88949720 nanoseconds spent executing 258 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    4671332 nanoseconds spent executing 1 flushes (flushing a total of 9130 entities and 0 collections);
    599862735 nanoseconds spent executing 258 partial-flushes (flushing a total of 1079473 entities and 1079473 collections)
}

Looking around the internet I saw a suggestion based on having WAY too many entities in the transaction to "use pagination and smaller batch increments"- I can find information about what pagination is, but not so much what "using smaller batch increments means"

I'm kind of in a bind where this app had fine performance doing basically the same thing using Apache DB Utils (a lightweight jdbc wrapper), and I'm so ignorant I don't even really know what to search for to speed this up. Help a brother out?

Also beans (persistence entities ...?) used here https://pastebin.com/pSQ3iGK2

Generally spoken: OR-Mapper allow you to model the entities in relation to another. I saw some 1:N relations in your code with a presenter having many programms.

The classname 'Programme' maybe the first mistake, because it is plural. Better use 'Programm' and model a @OneToMany relationship in class 'Presenter'.

When you do so, then you only have to fire one hibernate query. The found entities of type 'Presenter' will contain a list/set of 'Programm'. Iterate over the entities and convert them to your return-value 'DJAllProgrammes' which should only contain the plain values (dto) and not references to the entities. Ie map the entities to dto.

Using an ORM like Hibernate for this task will always be slower than the db utils in your prototype version, which uses the JDBC layer directly. Consider what is happening:

List<Presenter> result = session.createQuery("from Presenter p WHERE p.presenter_name in :names", Presenter.class)
        .setParameterList("names", names)
        .getResultList();

After the query is parsed, objects resolved, then the size of names determines the number of parameters it will be expanded into (?,?,?...) .
Then the query is sent, and once the results come in, each one has two copies of it created. One that you are given in the result list, and one that is kept internal to check for changes.

for (Presenter presenter : result) {
    int presenter_id = presenter.getPresenter_id();
    List<Programme> programmes = session
            .createQuery("from programme prog WHERE prog.presenter_origin_id = :pres_orig_id", Programme.class)
            .setParameter("pres_orig_id", presenter_id)
            .getResultList();

Here we have the same thing happening again, except it's actually a bit worse. Rather than reusing the query, you are creating a new one on every loop and discarding it after.
The same thing is happening in the nested loop.
Also if presenter.getPresenter_id() returns an Integer object, instead of a primitive int , you're doing unnecessary unboxing and then re-boxing on the .setParameter("pres_orig_id", presenter_id) call. Change it to Integer presenter_id if the method returns an Integer object. But if it's a primitive int , then this is not necessary, but it won't hurt, since the only usage is being passed on as an Object. You could even use it directly in setParameter .

So as a whole, when you take the createQuery calls out of the loop, you get this.

@Override
public DJAllProgrammes getAllProgrammesFromDJ(Collection<String> names) {
    DJAllProgrammes djAllProgrammes = new DJAllProgrammes();
    session.beginTransaction();
    List<Presenter> result = session.createQuery("from Presenter p WHERE p.presenter_name in :names", Presenter.class)
            .setParameterList("names", names)
            .getResultList();
    TypedQuery<Programme> progByPresenterOrigin = session
                .createQuery("from programme prog WHERE prog.presenter_origin_id = :pres_orig_id", Programme.class);
    TypedQuery<Track> trackByProgrammeId = session
                    .createQuery("FROM track t WHERE t.programme.programme_id in :progIds", Track.class)
    for (Presenter presenter : result) {
        List<Programme> programmes = progByPresenterOrigin
                .setParameter("pres_orig_id", presenter.getPresenter_id())
                .getResultList();
        for (Programme programme : programmes) {
            //this is the critical performance death zone 
            List<Track> tracksOnThisProgramme = trackByProgrammeId
                    .setParameter("progIds", programme.getProgramme_id())
                    .getResultList();
            djAllProgrammes.addProgramme(new ProgrammeData(presenter.getPresenter_name(), programme.getDate(), tracksOnThisProgramme));
        }
    }
    session.getTransaction().commit();
    return djAllProgrammes;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM