简体   繁体   English

最佳Solr JVM /虚拟/物理内存配置

[英]Optimal Solr JVM/Virtual/Physical Memory Configuration

Our company has several different ways of getting leads, and several types of leads we deal with. 我们公司有几种不同的获取潜在客户的方式,以及我们处理的几种类型的潜在客户。 There are only slight differences between each type of lead and much of the information is shared with or related to one or more other lead types. 每种类型的潜在客户之间仅存在细微差别,并且大部分信息与一个或多个其他潜在客户类型共享或相关。 Me and my team are trying to build/configure an index using Solr that handles each of these lead types and all their shared data .. customer data, resort data. 我和我的团队正在尝试使用Solr构建/配置索引,Solr处理这些引导类型及其所有共享数据......客户数据,度假数据。 etc (around 1.2 million records in all). 等(总共大约120万条记录)。 We're currently hosting an Ubuntu server (12G RAM, 8 core Opteron), running Tomcat 6 and Solr 3.4. 我们目前正在托管一台Ubuntu服务器(12G RAM,8核Opteron),运行Tomcat 6和Solr 3.4。

I'd like the index to add records in live time when a customer submits a lead-gen form on our website(around 1500-2000 daily), as well as update when employees add or modify data (around 2500-3000 times daily). 当客户在我们的网站上提交引导表格(每天大约1500-2000)时,我希望索引能够在实时时间内添加记录,以及当员工添加或修改数据时更新(每天约2500-3000次) 。

In addition I need customers on the website and employees in house to be able to quickly search this data with filters, facets, auto-completes, highlighting and all the stuff that one has come to expect from a well written search. 此外,我需要网站上的客户和内部员工能够使用过滤器,方面,自动完成,突出显示以及人们通过精心编写的搜索所期望的所有内容快速搜索此数据。

This setup is currently functioning, but often hangs updating records both on the website and in our internal apps. 此设置目前正在运行,但通常会挂起更新网站和内部应用程序中的记录。 Commits are done every 1000 documents or 5 seconds and I optimize once daily. 提交每1000个文档或5秒完成,我每天优化一次。 What are the optimal JVM, Server or Solr configurations for this type of setup? 这种类型的设置有哪些最佳JVM,服务器或Solr配置? Any help would be appreciated and I can provide as much information as needed to anyone willing to help. 任何帮助将不胜感激,我可以向愿意提供帮助的任何人提供所需的信息。

First, you should not optimize . 首先, 你不应该优化

There are two common erros when configuring the JVM heap size in Solr: 在Solr中配置JVM堆大小时有两个常见的错误:

  • giving too much memory to the JVM, (the OS cache won't be able to cache disk operations), 给JVM提供太多内存,(操作系统缓存无法缓存磁盘操作),
  • giving not enough memory to the JVM (there will be a lot of pressure on the garbage collector which will be forced to run frequent stop-the-world collections, use JMX monitoring to figure out whether full GC get triggered). 给JVM提供的内存不足(垃圾收集器会受到很大的压力,它将被迫运行频繁的stop-the-world集合,使用JMX监控来确定是否触发了完整的GC)。

One other reason why you application may hang is the background merges. 应用程序可能挂起的另一个原因是后台合并。 Lucene is based on segments, and whenever the number of segments gets higher than mergeFactor , a merge is triggered. Lucene基于段,每当段数高于mergeFactor ,就会触发合并。 A low value of mergeFactor might explain the hangs. mergeFactormergeFactor可能会解释挂起。

You should give more details on your current setup so that we can help you: 您应该提供有关当前设置的更多详细信息,以便我们为您提供帮助:

  • JVM size, JVM大小,
  • what collector you are using (G1, throughput collector, concurrent low pause collector, ...) 你正在使用什么样的收集器(G1,吞吐量收集器,并发低暂停收集器......)
  • index size (on disk, not the number of documents), 索引大小(在磁盘上,而不是文档的数量),
  • mergeFactor , ramBufferSizeMB , ... mergeFactorramBufferSizeMB ,...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM