简体   繁体   中英

Optimizing Solr DataImportHandler settings for full-import speed

I have a Solr server setup using the DataImportHandler2. Using my current settings, a full-import is taking 8-9 hours. I'd like to optimize settings to reduce that time, but the documentation isn't very clear about what various settings do and what side effects they have.

The server is a m2.2xlarge AWS instance (34.2 GB RAM). The Solr version is 3.6.1.2012.07.17.12.45.52. Solr running on Tomcat 7.0.30. Tomcat is running with -Xms4096m -Xmx28672m.

From solrconfig.xml, mergeFactor is 10, useCompoundFile is false. From data-config.xml, autoCommit is true, batchSize is -1. The query the DataImportHandler is using returns 6 million records.

Before even looking at mergeFactor et al, you should look at the entities in your db-data-config.xml. If you have entities inside other entities these will generate a lot of sql requests. You need to either work on your sql to not do inner entities or look at CachedSqlEntityProcessor etc

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM