简体   繁体   English

Solr提交并优化问题

[英]Solr commit and optimize questions

I have a classifieds website. 我有一个分类广告网站。 Users may put ads, edit ads, view ads etc. 用户可以放置广告,修改广告,查看广告等。

Whenever a user puts an ad, I am adding a document to Solr. 每当用户放置广告时,我都会向Solr添加文档。 I don't know, however, when to commit it. 但是,我不知道何时提交。 Commit slows things down from what I have read. 提交会使我阅读的内容变慢。

How should I do it? 我该怎么办? Autocommit every 12 hours or so? 每12个小时左右自动提交一次?

Also, how should I do it with optimize? 另外,我应该如何进行优化?

A little more detail on Commit/Optimize: 有关提交/优化的更多详细信息:

Commit: When you are indexing documents to solr none of the changes you are making will appear until you run the commit command. 提交:在为文档编制索引以进行solr时,您所做的任何更改都不会显示,直到您运行commit命令为止。 So timing when to run the commit command really depends on the speed at which you want the changes to appear on your site through the search engine. 因此,何时运行commit命令实际上取决于您希望更改通过搜索引擎显示在网站上的速度。 However it is a heavy operation and so should be done in batches not after every update. 但是,这是一项繁重的操作,因此不应该在每次更新后分批进行。

Optimize: This is similar to a defrag command on a hard drive. 优化:这类似于硬盘驱动器上的碎片整理命令。 It will reorganize the index into segments (increasing search speed) and remove any deleted (replaced) documents. 它将把索引重新组织成段(提高搜索速度),并删除所有删除(替换)的文档。 Solr is a read only data store so every time you index a document it will mark the old document as deleted and then create a brand new document to replace the deleted one. Solr是只读数据存储,因此每次索引文档时,它都会将旧文档标记为已删除,然后创建一个全新的文档来替换已删除的文档。 Optimize will remove these deleted documents. 优化将删除这些已删除的文档。 You can see the search document vs. deleted document count by going to the Solr Statistics page and looking at the numDocs vs. maxDocs numbers. 您可以转到Solr Statistics页面并查看numDocs vs. maxDocs编号,以查看搜索文档与已删除文档的数量。 The difference between the two numbers is the amount of deleted (non-search able) documents in the index. 这两个数字之间的差异是索引中已删除(无法搜索)的文档数量。

Also Optimize builds a whole NEW index from the old one and then switches to the new index when complete. 此外,Optimize从旧索引建立一个完整的NEW索引,然后在完成时切换到新索引。 Therefore the command requires double the space to perform the action. 因此,该命令需要两倍的空间来执行操作。 So you will need to make sure that the size of your index does not exceed %50 of your available hard drive space. 因此,您需要确保索引的大小不超过可用硬盘空间的%50。 (This is a rule of thumb, it usually needs less then %50 because of deleted documents) (这是一条经验法则,由于删除了文档,通常需要少于%50的费用)

Index Server / Search Server: Paul Brown was right in that the best design for solr is to have a server dedicated and tuned to indexing, and then replicate the changes to the searching servers. 索引服务器/搜索服务器:Paul Brown的正确做法是,solr的最佳设计是将服务器专用并调整为索引,然后将更改复制到搜索服务器。 You can tune the index server to have multiple index end points. 您可以调整索引服务器以使其具有多个索引端点。

eg: http://solrindex01/index1; http://solrindex01/index2

And since the index server is not searching for content you can have it set up with different memory footprints and index warming commands etc. 而且由于索引服务器不搜索内容,因此可以使用不同的内存占用量和索引预热命令等对其进行设置。

Hope this is useful info for everyone. 希望这对大家有用。

Actually, committing often and optimizing makes things really slow. 实际上,频繁提交和优化会使事情变慢。 It's too heavy. 太重了

After a day of searching and reading stuff, I found out this: 经过一天的搜索和阅读,我发现了这一点:

1- Optimize causes the index to double in size while beeing optimized, and makes things really slow. 1-优化会使索引在优化的同时大小增加一倍,并使事情真的变慢。

2- Committing after each add is NOT a good idea, it's better to commit a couple of times a day, and then make an optimize only once a day at most. 2-在每次添加之后进行提交不是一个好主意,最好每天进行两次,然后最多每天仅进行一次优化。

3- Commit should be set to "autoCommit" in the solrconfig.xml file, and there it should be tuned according to your needs. 3-在solrconfig.xml文件中应将Commit设置为“ autoCommit”,并应根据您的需要对其进行调整。

The way that this sort of thing is usually done is to perform commit/optimize operations on a Solr node located out of the request path for your users. 通常执行这种操作的方式是在用户请求路径之外的Solr节点上执行提交/优化操作。 This requires additional hardware, but it ensures that the performance penalty of the indexing operations doesn't impact your users. 这需要额外的硬件,但是可以确保索引操作的性能损失不会影响用户。 Replication is used to periodically shuttle optimized index files from the master node to the nodes that perform search queries for users. 复制用于定期将优化的索引文件从主节点穿梭到对用户执行搜索查询的节点上。

Try it first. 请先尝试。 It would be really bad if you avoided a simple and elegant solution just because you read that it might cause a performance problem. 如果您仅避免阅读简单优雅的解决方案可能会导致性能问题,那将是非常糟糕的。 In other words, avoid premature optimization . 换句话说,请避免过早优化

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM