简体   繁体   中英

Improving performance of Solr and MySQL pair (connected by “WHERE IN” using JPA)

I have a MySQL database which is indexed by Solr. I carry out searches using Solr (fast), and I retrieve every result in the Solr search from the database using JPA. JPA runs a WHERE IN query on the database which is VERY slow.

Is there a way to make this process faster, or to refactor the design to improve performance?

I have just refactored the whole application from using MySQL's fulltext search to use Solr, and now the performance is worse.

Note: I need all results immediately to carry out calculations on, and thus, I cannot use pagination.

Java code:

    SolrDocumentList documentList = response.getResults();
    Collection<String> listingIds = new ArrayList<>();
    for(SolrDocument doc : documentList) {
        String listingId = (String) doc.getFirstValue("ListingId");
        listingIds.add(listingId);
    }

    Query query = em.createNamedQuery("getAllListingsWithId");
    query.setParameter("listingIds", listingIds);
    List<ListedItemDetail> listings = query.getResultList();

Named Query:

<query>Select listing from ListingSet listing where listing.listingId in :listingIds</query>

Additional Information:

SHOW CREATE TABLE ListingSet produces [shortened]:

CREATE TABLE `listingset` (
  `LISTINGID` int(11) NOT NULL,
  `STARTDATE` datetime DEFAULT NULL,
  `STARTPRICE` decimal(10,2) DEFAULT NULL,
  `TITLE` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`LISTINGID`),
  KEY `FK_LISTINGSET_MEMBER_MEMBERID` (`MEMBER_MEMBERID`),
  CONSTRAINT `FK_LISTINGSET_MEMBER_MEMBERID` FOREIGN KEY (`MEMBER_MEMBERID`) REFERENCES `member` (`MEMBERID`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Investigating the generated SQL

Looking at the generated SQL, JPA runs a lot of SQL queries for a single JPA query. The ListingSet table has 7 tables it is linked to, and runs a separate SELECT query for each table for EACH listingid (of which there are 1,000 - 10,000). So my one JPA query gets blown into what looks like ~7,000 queries!

Following is just personal thoughts about debugging the problems:

  • Turn on mysql query log and check the JPA not access MySQL every query for each listingId.

    mysql -uroot -pYOUR-PASSWORD -e "SET GLOBAL log_output = 'FILE'; Set GLOBAL general_log_file = '/tmp/mysql.log'; SET GLOBAL general_log = 'ON';" tail -f /tmp/mysql.log

  • check if the performance is caused by MySQL, run the equivalent SQL in you MySQL database.

    Select listing from ListingSet where listingId in (put your real listingId here);

    Make sure has index on ListingId column (maybe very good chance is the index is already there)

  • Since you only read the rows from MySQL, maybe you can setup Replicate for more slave, then split your ListingIds to all the slave MySQL, and merge the results afterwards. http://dev.mysql.com/doc/refman/5.0/en/replication-howto.html

The problem was caused by my use of JPA. Due to the many relationship my entity had, a single query exploded into 1,000-10,000 queries.

The solution is to use Batch Processing in JPA to prevent the ORM n + 1 query problem. Batch processing causes JPA to request all relevant rows from related tables at once, rather than once for each entity. This solution is appropriate when a query returns many results, and the entity being queried has many relationships.

The easiest way to determine potential issues with JPA is to enable finer logging. For EclipseLink, add a property to persistence.xml :

  <property name="eclipselink.logging.level" value="FINEST"/>

Be wary that the logging produced under the default settings for EclipseLink only displays the JPQL form of the queries.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM