简体   繁体   English

提高Solr和MySQL对的性能(使用JPA通过“WHERE IN”连接)

[英]Improving performance of Solr and MySQL pair (connected by “WHERE IN” using JPA)

I have a MySQL database which is indexed by Solr. 我有一个由Solr索引的MySQL数据库。 I carry out searches using Solr (fast), and I retrieve every result in the Solr search from the database using JPA. 我使用Solr(快速)执行搜索,并使用JPA从数据库中检索Solr搜索中的每个结果。 JPA runs a WHERE IN query on the database which is VERY slow. JPA在数据库上运行WHERE IN查询非常慢。

Is there a way to make this process faster, or to refactor the design to improve performance? 有没有办法让这个过程更快,或重构设计以提高性能?

I have just refactored the whole application from using MySQL's fulltext search to use Solr, and now the performance is worse. 我刚刚使用MySQL的全文搜索重构整个应用程序以使用Solr,现在性能更差。

Note: I need all results immediately to carry out calculations on, and thus, I cannot use pagination. 注意:我需要立即对所有结果进行计算,因此我不能使用分页。

Java code: Java代码:

    SolrDocumentList documentList = response.getResults();
    Collection<String> listingIds = new ArrayList<>();
    for(SolrDocument doc : documentList) {
        String listingId = (String) doc.getFirstValue("ListingId");
        listingIds.add(listingId);
    }

    Query query = em.createNamedQuery("getAllListingsWithId");
    query.setParameter("listingIds", listingIds);
    List<ListedItemDetail> listings = query.getResultList();

Named Query: 命名查询:

<query>Select listing from ListingSet listing where listing.listingId in :listingIds</query>

Additional Information: 附加信息:

SHOW CREATE TABLE ListingSet produces [shortened]: SHOW CREATE TABLE ListingSet生成[缩短]:

CREATE TABLE `listingset` (
  `LISTINGID` int(11) NOT NULL,
  `STARTDATE` datetime DEFAULT NULL,
  `STARTPRICE` decimal(10,2) DEFAULT NULL,
  `TITLE` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`LISTINGID`),
  KEY `FK_LISTINGSET_MEMBER_MEMBERID` (`MEMBER_MEMBERID`),
  CONSTRAINT `FK_LISTINGSET_MEMBER_MEMBERID` FOREIGN KEY (`MEMBER_MEMBERID`) REFERENCES `member` (`MEMBERID`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Investigating the generated SQL 调查生成的SQL

Looking at the generated SQL, JPA runs a lot of SQL queries for a single JPA query. 查看生成的SQL,JPA为单个JPA查询运行了大量SQL查询。 The ListingSet table has 7 tables it is linked to, and runs a separate SELECT query for each table for EACH listingid (of which there are 1,000 - 10,000). ListingSet表有7个链接到的表,并为每个表的每个表运行一个单独的SELECT查询(其中有1,000 - 10,000个)。 So my one JPA query gets blown into what looks like ~7,000 queries! 所以我的一个JPA查询被吹成了~7,000个查询!

Following is just personal thoughts about debugging the problems: 以下是关于调试问题的个人想法:

  • Turn on mysql query log and check the JPA not access MySQL every query for each listingId. 打开mysql查询日志并检查JPA是否每次查询每个listingId都不访问MySQL。

    mysql -uroot -pYOUR-PASSWORD -e "SET GLOBAL log_output = 'FILE'; Set GLOBAL general_log_file = '/tmp/mysql.log'; SET GLOBAL general_log = 'ON';" mysql -uroot -pYOUR-PASSWORD -e“SET GLOBAL log_output ='FILE';设置GLOBAL general_log_file ='/ tmp / mysql.log'; SET GLOBAL general_log ='ON';” tail -f /tmp/mysql.log tail -f /tmp/mysql.log

  • check if the performance is caused by MySQL, run the equivalent SQL in you MySQL database. 检查性能是否由MySQL引起,在MySQL数据库中运行等效的SQL。

    Select listing from ListingSet where listingId in (put your real listingId here); 从ListingSet中选择listingId所在的列表(将你的真实listingId放在这里);

    Make sure has index on ListingId column (maybe very good chance is the index is already there) 确保在ListingId列上有索引(可能很有可能索引已经存在)

  • Since you only read the rows from MySQL, maybe you can setup Replicate for more slave, then split your ListingIds to all the slave MySQL, and merge the results afterwards. 由于您只读取了MySQL中的行,也许您可​​以为更多的slave设置Replicate,然后将您的ListingIds拆分为所有从属MySQL,然后合并结果。 http://dev.mysql.com/doc/refman/5.0/en/replication-howto.html http://dev.mysql.com/doc/refman/5.0/en/replication-howto.html

The problem was caused by my use of JPA. 问题是由于我使用JPA引起的。 Due to the many relationship my entity had, a single query exploded into 1,000-10,000 queries. 由于我的实体之间存在许多关系,因此单个查询分解为1,000-10,000个查询。

The solution is to use Batch Processing in JPA to prevent the ORM n + 1 query problem. 解决方案是在JPA中使用批处理来防止ORM n + 1查询问题。 Batch processing causes JPA to request all relevant rows from related tables at once, rather than once for each entity. 批处理使JPA立即从相关表中请求所有相关行,而不是每个实体请求一次。 This solution is appropriate when a query returns many results, and the entity being queried has many relationships. 当查询返回许多结果时,此解决方案是合适的,并且被查询的实体具有许多关系。

The easiest way to determine potential issues with JPA is to enable finer logging. 确定JPA潜在问题的最简单方法是实现更精细的日志记录。 For EclipseLink, add a property to persistence.xml : 对于EclipseLink,将属性添加到persistence.xml

  <property name="eclipselink.logging.level" value="FINEST"/>

Be wary that the logging produced under the default settings for EclipseLink only displays the JPQL form of the queries. 请注意,在EclipseLink的默认设置下生成的日志记录仅显示查询的JPQL格式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM