[英]SOLR performance
我在我的項目中使用SolrJ + Solr。 問題是我面臨有關Solr / Jetty的不清楚的瓶頸
使用jvisualvm,我連接到Solr在其下啟動的JVM實例,並發現方法“ org.eclipse.jetty.io.ByteArrayBuffer.readFrom()”中花費了77%的時間,其中一個線程的堆棧跟蹤如下:
"qtp64700533-36718" - Thread t@36718
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at org.eclipse.jetty.io.ByteArrayBuffer.readFrom(ByteArrayBuffer.java:391)
at org.eclipse.jetty.io.bio.StreamEndPoint.fill(StreamEndPoint.java:141)
at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.fill(SocketConnector.java:227)
at org.eclipse.jetty.http.HttpParser.fill(HttpParser.java:1040)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:280)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)
因此,花費在I / O上的時間看起來不錯,但是:
任何幫助/建議表示贊賞,謝謝!
UPD:
的確,伙計們,我忘了提供其他信息。 這里是:
硬件 :i3770,32gb ram,根據iotop,它顯示50-600kb / sec的讀取速度,200-1000kb / sec的寫入速度(幾乎與SOLR進程有關)
操作系統 :Centos 6.6
java :OpenJDK 64位服務器VM(1.7.0_71 24.65-b04)
solr :4.9.0(以-Xmx = 24000啟動,但是我認為應該將SOLR內核拆分為分離的JVM SOLR實例,以最大程度地減少GC時間)
solrj :4.10.3,在Java代碼中以commitWithIn = 10000毫秒完成添加/更新/刪除文檔。
關於模式:我正在SOLR數據(廣告+對象)中存儲有關5個國家/地區的信息:UA,RU,PL,BY,KZ。 因此,每個國家/地區都有2個核心,例如烏克蘭:ua_ads和ua_objects(總共10個核心)國家/地區之間的架構幾乎相同,請參見下文中的烏克蘭
“ ua_ads”模式(盡管應該從“ example”重命名它:))
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.5">
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
<fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<fieldType name="text_ru" class="solr.TextField" positionIncrementGap="100"/>
<field name="_version_" type="long" indexed="true" stored="true"/>
<uniqueKey>adId</uniqueKey>
<field name="adId" type="long" indexed="true" stored="true" required="true"/>
<field name="objectId" type="long" indexed="true" stored="true" required="false"/>
<field name="url" type="string" indexed="false" stored="true" required="true"/>
<field name="regionId" type="int" indexed="false" stored="true" required="true"/>
<field name="sourceId" type="int" indexed="false" stored="true" required="true"/>
<field name="type" type="int" indexed="false" stored="true" required="true"/>
<field name="title" type="text_ru" indexed="false" stored="true" required="true"/>
<field name="address" type="text_ru" indexed="false" stored="true" required="true"/>
<field name="text" type="text_ru" indexed="false" stored="true" required="true"/>
<field name="dateFound" type="tdate" indexed="true" stored="true" required="true"/>
<!-- should be a string field (not int) to avoid cutting zero at beginning of phone number -->
<field name="phoneNumbers" type="string" indexed="true" stored="true" required="true" multiValued="true"/>
<field name="priceLocal" type="long" indexed="false" stored="true" required="false"/>
<field name="priceUsd" type="long" indexed="false" stored="true" required="false"/>
<field name="currency" type="int" indexed="false" stored="true" required="false"/>
<field name="roomsCount" type="int" indexed="false" stored="true" required="false"/>
<field name="area" type="int" indexed="false" stored="true" required="false"/>
<field name="imagesCount" type="int" indexed="true" stored="true" required="true"/>
</schema>
“ ua_objects”架構
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.5">
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
<fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<fieldtype name="binary" class="solr.BinaryField"/>
<fieldType name="addr_ru" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<!-- no stemming for address, dots must me followed by space: "г. Киев" -->
<!-- char filters is always firs (preprocessing) -->
<charFilter class="solr.MappingCharFilterFactory" mapping="lang/chars_replacement.txt" />
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<!-- replacing all except letters, removing "-" in home address (9-А) -->
<filter class="solr.PatternReplaceFilterFactory" pattern="[^0-9abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюяіїє\-]" replacement="" replace="all"/>
<!-- replacing all except letters, removing "-" in home address ("9-а" => "9а") -->
<filter class="solr.PatternReplaceFilterFactory" pattern="(\d{1,3})[\- ]([абвгдеёжзийклмнопрстуфхцчшщ])" replacement="$1$2" replace="all"/>
<filter class="solr.SynonymFilterFactory" ignoreCase="true" synonyms="lang/cities_ukr2rus.txt"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="ї" replacement="и" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="і" replacement="и" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="й" replacement="и" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="ё" replacement="е" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="є" replacement="е" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="э" replacement="е" replace="all"/>
<!-- 1-length is for case with home letters: "Хрещатик, 3" -->
<filter class="solr.LengthFilterFactory" min="1" max="64"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt,lang/stopwords_addr.txt" format="snowball"/>
</analyzer>
</fieldType>
<fieldType name="text_ru" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<!-- dots must me followed by space: "г. Киев" -->
<!-- char filters is always firs (preprocessing) -->
<charFilter class="solr.MappingCharFilterFactory" mapping="lang/chars_replacement.txt" />
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="[^0-9abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюяіїє\-]" replacement="" replace="all"/>
<!-- replacing all except letters, removing "-" in home address ("9-а" => "9а") -->
<filter class="solr.PatternReplaceFilterFactory" pattern="(\d{1,3})[\- ]([абвгдеёжзийклмнопрстуфхцчшщ])" replacement="$1$2" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="ї" replacement="и" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="і" replacement="и" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="й" replacement="и" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="ё" replacement="е" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="є" replacement="е" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="э" replacement="е" replace="all"/>
<filter class="solr.LengthFilterFactory" min="1" max="64"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt" format="snowball"/>
<filter class="solr.SynonymFilterFactory" ignoreCase="true" synonyms="lang/synonyms.txt"/>
<filter class="solr.SnowballPorterFilterFactory" language="Russian"/>
</analyzer>
</fieldType>
<field name="_version_" type="long" indexed="true" stored="true"/>
<uniqueKey>objectId</uniqueKey>
<field name="objectId" type="long" indexed="true" stored="true" required="true"/>
<field name="url" type="string" indexed="false" stored="true" required="true"/>
<field name="regionId" type="int" indexed="true" stored="true" required="true"/>
<field name="sourceId" type="int" indexed="false" stored="true" required="true"/>
<field name="type" type="int" indexed="true" stored="true" required="true"/>
<field name="address" type="addr_ru" indexed="true" stored="true" required="true"/>
<field name="title" type="text_ru" indexed="true" stored="true" required="true"/>
<field name="text" type="text_ru" indexed="true" stored="true" required="true"/>
<field name="dateFound" type="tdate" indexed="true" stored="true" required="true"/>
<!-- should be a string field (not int) to avoid cutting zero at beginning of phone number -->
<field name="phoneNumbers" type="string" indexed="true" stored="true" required="true" multiValued="true"/>
<field name="ownerDetected" type="boolean" indexed="true" stored="true" required="true"/>
<field name="priceUsd" type="long" indexed="true" stored="true" required="false"/>
<field name="priceLocal" type="long" indexed="false" stored="true" required="false"/>
<field name="currency" type="int" indexed="false" stored="true" required="false"/>
<field name="roomsCount" type="int" indexed="true" stored="true" required="false"/>
<field name="area" type="int" indexed="true" stored="true" required="false"/>
<field name="dateUpdated" type="tdate" indexed="true" stored="true" required="true"/>
<field name="dateClosed" type="tdate" indexed="true" stored="true" required="false"/>
<field name="m2priceRel" type="float" indexed="true" stored="true" required="false"/>
<field name="ceddData" type="binary" indexed="false" stored="true" required="false" multiValued="true"/>
<field name="imagesCount" type="int" indexed="true" stored="true" required="true"/>
<field name="uniqAdTexts" type="string" indexed="false" stored="true" required="true" multiValued="true"/>
</schema>
最大指標:
ru_ads:2.99GB
ru_objects:3.25gb
ua_ads:5.45GB
ua_objects:2.36gb
其他核心指標相對較小
運行時間太長的查詢(從客戶端“太長”)看起來像這樣(從SOLR日志中獲取,“ ????”只是非英語字母)
400723188 [qtp64700533-40547] INFO org.apache.solr.core.SolrCore ? [ua-objects] webapp=/solr path=/select params={mm=2&fl=*&start=0&q=(??????\+????????\+???????\+????????)+AND+type:3+AND+regionId:2+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[2+TO+2])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[40+TO+60])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[23500+TO+70500])+AND+dateUpdated:[2014-12-09T10:23:07Z+TO+2015-01-28T10:23:07Z]+AND+-objectId:(27824841)&qf=address^20+title^2&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=18 status=0 QTime=287
401989528 [qtp64700533-40830] INFO org.apache.solr.core.SolrCore ? [ru-objects] webapp=/solr path=/select params={mm=2&fl=*&start=0&q=(?????????????\+??????)+AND+type:4+AND+regionId:162+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[1+TO+1])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[40+TO+58])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[9+TO+27])+AND+dateUpdated:[2014-12-09T10:44:08Z+TO+2015-01-28T10:44:08Z]+AND+-objectId:(26415616)&qf=address^20+title^2&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=820 status=0 QTime=5755
400832723 [qtp64700533-40322] INFO org.apache.solr.core.SolrCore ? [ru-objects] webapp=/solr path=/select params={mm=2&fl=*&start=0&q=(????????\+???????)+AND+type:4+AND+regionId:102+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[1+TO+1])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[31+TO+45])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[115+TO+343])+AND+dateUpdated:[2014-12-09T10:24:57Z+TO+2015-01-28T10:24:57Z]+AND+-objectId:(26415342)&qf=address^20+title^2&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=9 status=0 QTime=372
402069370 [qtp64700533-40832] INFO org.apache.solr.core.SolrCore ? [ru-objects] webapp=/solr path=/select params={mm=1&fl=*&start=0&q=(????????\+?????????\+??\+????????)+AND+type:3+AND+regionId:135+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[1+TO+1])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[28+TO+40])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[9529+TO+28585])+AND+dateUpdated:[2014-10-30T10:45:33Z+TO+2015-01-28T10:45:33Z]+AND+-objectId:(26415855)&qf=address^20+title^2+text&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=14075 status=0 QTime=544
401805198 [qtp64700533-40233] INFO org.apache.solr.core.SolrCore ? [ua-objects] webapp=/solr path=/select params={mm=2&fl=*&start=0&q=(??????\+??\+??????\+?????\+??????????)+AND+type:3+AND+regionId:16+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[3+TO+3])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[93+TO+95])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[284050+TO+313950])+AND+dateUpdated:[2015-01-08T10:41:09Z+TO+2015-01-28T10:41:09Z]+AND+-objectId:(27826334)&qf=address^20+title^2&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=6 status=0 QTime=462
這是jvisualvm的最新概要分析屏幕截圖
“ top”命令的一部分,延遲= 10秒
您在每個查詢中都給定了參數rows=2147483647
。 該參數的含義是(引用自)
您可以使用rows參數對查詢結果進行分頁。 該參數指定完整結果集中Solr一次應返回給客戶端的最大文檔數。
默認值為10。即,默認情況下,Solr一次返回10個文檔以響應查詢。
因此,您要告訴Solr實際上是在單個響應中發送針對查詢找到的所有匹配。 這就是您表現不佳的原因。
Google是否會向您發送在查詢“ java”時找到的所有500.000.000個匹配項,否。 為什么不呢? 我知道的每個IR應用程序都會為您提供一小頁的第一頁結果,從而使搜索效果良好。
這是I / O較高的原因,solr從磁盤中獲取記錄並將它們寫入響應中。 這是I / O,僅此而已。
由於您正在使用它進行分析,並且想提取所有匹配項,因此您應該研究新的流導出功能。 不幸的是,它僅在Solr 4.10中可用。
您還可以更新到SSD-這對於提高Solr性能非常有利。
最后,查看您的緩存級別。 如果您不經常更新並且某些緩存已滿,則可以增加默認值。 如果您確實經常更新,那么它就不會像在提交時使緩存失效一樣沒有好處。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.