簡體   English   中英

SOLR性能

[英]SOLR performance

我在我的項目中使用SolrJ + Solr。 問題是我面臨有關Solr / Jetty的不清楚的瓶頸

使用jvisualvm,我連接到Solr在其下啟動的JVM實例,並發現方法“ org.eclipse.jetty.io.ByteArrayBuffer.readFrom()”中花費了77%的時間,其中一個線程的堆棧跟蹤如下:

"qtp64700533-36718" - Thread t@36718
   java.lang.Thread.State: RUNNABLE
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:152)
    at java.net.SocketInputStream.read(SocketInputStream.java:122)
    at org.eclipse.jetty.io.ByteArrayBuffer.readFrom(ByteArrayBuffer.java:391)
    at org.eclipse.jetty.io.bio.StreamEndPoint.fill(StreamEndPoint.java:141)
    at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.fill(SocketConnector.java:227)
    at org.eclipse.jetty.http.HttpParser.fill(HttpParser.java:1040)
    at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:280)
    at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
    at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
    at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
    at java.lang.Thread.run(Thread.java:745)

因此,花費在I / O上的時間看起來不錯,但是:

  1. 應用程序,它執行在本地計算機上啟動的查詢(因此I / O時間不應太大,並且在stacktrace中的線程狀態“ RUNNABLE”似乎可疑)
  2. 查詢響應時間可能長達5-10秒
  3. 機器上的平均負載(CentOS)約為10

任何幫助/建議表示贊賞,謝謝!

UPD:
的確,伙計們,我忘了提供其他信息。 這里是:

硬件 :i3770,32gb ram,根據iotop,它顯示50-600kb / sec的讀取速度,200-1000kb / sec的寫入速度(幾乎與SOLR進程有關)
操作系統 :Centos 6.6
java :OpenJDK 64位服務器VM(1.7.0_71 24.65-b04)
solr :4.9.0(以-Xmx = 24000啟動,但是我認為應該將SOLR內核拆分為分離的JVM SOLR實例,以最大程度地減少GC時間)
solrj :4.10.3,在Java代碼中以commitWithIn = 10000毫秒完成添加/更新/刪除文檔。

關於模式:我正在SOLR數據(廣告+對象)中存儲有關5個國家/地區的信息:UA,RU,PL,BY,KZ。 因此,每個國家/地區都有2個核心,例如烏克蘭:ua_ads和ua_objects(總共10個核心)國家/地區之間的架構幾乎相同,請參見下文中的烏克蘭

“ ua_ads”模式(盡管應該從“ example”重命名它:))

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.5">
  <fieldType name="int"       class="solr.TrieIntField"   precisionStep="0" positionIncrementGap="0"/>
  <fieldType name="long"      class="solr.TrieLongField"  precisionStep="0" positionIncrementGap="0"/>
  <fieldType name="boolean"   class="solr.BoolField"      sortMissingLast="true"/>
  <fieldType name="tdate"     class="solr.TrieDateField"  precisionStep="6" positionIncrementGap="0"/>
  <fieldType name="string"    class="solr.StrField"       sortMissingLast="true" />
  <fieldType name="text_ru"   class="solr.TextField"      positionIncrementGap="100"/>

  <field name="_version_" type="long" indexed="true" stored="true"/>

  <uniqueKey>adId</uniqueKey>

  <field name="adId"          type="long"     indexed="true"    stored="true"   required="true"/>
  <field name="objectId"      type="long"     indexed="true"    stored="true"   required="false"/>
  <field name="url"           type="string"   indexed="false"   stored="true"   required="true"/>
  <field name="regionId"      type="int"      indexed="false"   stored="true"   required="true"/>
  <field name="sourceId"      type="int"      indexed="false"   stored="true"   required="true"/>
  <field name="type"          type="int"      indexed="false"   stored="true"   required="true"/>
  <field name="title"         type="text_ru"  indexed="false"   stored="true"   required="true"/>
  <field name="address"       type="text_ru"  indexed="false"   stored="true"   required="true"/>
  <field name="text"          type="text_ru"  indexed="false"   stored="true"   required="true"/>
  <field name="dateFound"     type="tdate"    indexed="true"    stored="true"   required="true"/>
  <!-- should be a string field (not int) to avoid cutting zero at beginning of phone number -->
  <field name="phoneNumbers"  type="string"   indexed="true"    stored="true"   required="true"   multiValued="true"/>
  <field name="priceLocal"    type="long"     indexed="false"   stored="true"   required="false"/>
  <field name="priceUsd"      type="long"     indexed="false"   stored="true"   required="false"/>
  <field name="currency"      type="int"      indexed="false"   stored="true"   required="false"/>

  <field name="roomsCount"    type="int"      indexed="false"   stored="true"   required="false"/>
  <field name="area"          type="int"      indexed="false"   stored="true"   required="false"/>
  <field name="imagesCount"   type="int"      indexed="true"    stored="true"   required="true"/>
</schema>

“ ua_objects”架構

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.5">

  <fieldType name="int"     class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
  <fieldType name="long"    class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
  <fieldType name="float"   class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
  <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
  <fieldType name="tdate"   class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
  <fieldType name="string"  class="solr.StrField" sortMissingLast="true" />
  <fieldtype name="binary"  class="solr.BinaryField"/>

  <fieldType name="addr_ru" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <!-- no stemming for address, dots must me followed by space: "г. Киев" -->
      <!-- char filters is always firs (preprocessing) -->
      <charFilter class="solr.MappingCharFilterFactory" mapping="lang/chars_replacement.txt" />
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <!-- replacing all except letters, removing "-" in home address (9-А) -->
      <filter class="solr.PatternReplaceFilterFactory" pattern="[^0-9abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюяіїє\-]" replacement="" replace="all"/>
      <!-- replacing all except letters, removing "-" in home address ("9-а" => "9а") -->
      <filter class="solr.PatternReplaceFilterFactory" pattern="(\d{1,3})[\- ]([абвгдеёжзийклмнопрстуфхцчшщ])" replacement="$1$2" replace="all"/>
      <filter class="solr.SynonymFilterFactory" ignoreCase="true" synonyms="lang/cities_ukr2rus.txt"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="ї" replacement="и" replace="all"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="і" replacement="и" replace="all"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="й" replacement="и" replace="all"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="ё" replacement="е" replace="all"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="є" replacement="е" replace="all"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="э" replacement="е" replace="all"/>
      <!-- 1-length is for case with home letters: "Хрещатик, 3" -->
      <filter class="solr.LengthFilterFactory" min="1" max="64"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt,lang/stopwords_addr.txt" format="snowball"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_ru" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <!-- dots must me followed by space: "г. Киев" -->
      <!-- char filters is always firs (preprocessing) -->
      <charFilter class="solr.MappingCharFilterFactory" mapping="lang/chars_replacement.txt" />
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="[^0-9abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюяіїє\-]" replacement="" replace="all"/>
      <!-- replacing all except letters, removing "-" in home address ("9-а" => "9а") -->
      <filter class="solr.PatternReplaceFilterFactory" pattern="(\d{1,3})[\- ]([абвгдеёжзийклмнопрстуфхцчшщ])" replacement="$1$2" replace="all"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="ї" replacement="и" replace="all"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="і" replacement="и" replace="all"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="й" replacement="и" replace="all"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="ё" replacement="е" replace="all"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="є" replacement="е" replace="all"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="э" replacement="е" replace="all"/>
      <filter class="solr.LengthFilterFactory" min="1" max="64"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt" format="snowball"/>
      <filter class="solr.SynonymFilterFactory" ignoreCase="true" synonyms="lang/synonyms.txt"/>
      <filter class="solr.SnowballPorterFilterFactory" language="Russian"/>
    </analyzer>
  </fieldType>

  <field name="_version_" type="long" indexed="true" stored="true"/>

  <uniqueKey>objectId</uniqueKey>

  <field name="objectId"      type="long"     indexed="true"    stored="true"   required="true"/>
  <field name="url"           type="string"   indexed="false"   stored="true"   required="true"/>
  <field name="regionId"      type="int"      indexed="true"    stored="true"   required="true"/>
  <field name="sourceId"      type="int"      indexed="false"   stored="true"   required="true"/>
  <field name="type"          type="int"      indexed="true"    stored="true"   required="true"/>
  <field name="address"       type="addr_ru"  indexed="true"    stored="true"   required="true"/>
  <field name="title"         type="text_ru"  indexed="true"    stored="true"   required="true"/>
  <field name="text"          type="text_ru"  indexed="true"    stored="true"   required="true"/>
  <field name="dateFound"     type="tdate"    indexed="true"    stored="true"   required="true"/>
  <!-- should be a string field (not int) to avoid cutting zero at beginning of phone number -->
  <field name="phoneNumbers"  type="string"   indexed="true"    stored="true"   required="true"   multiValued="true"/>
  <field name="ownerDetected" type="boolean"  indexed="true"    stored="true"   required="true"/>
  <field name="priceUsd"      type="long"     indexed="true"    stored="true"   required="false"/>
  <field name="priceLocal"    type="long"     indexed="false"   stored="true"   required="false"/>
  <field name="currency"      type="int"      indexed="false"   stored="true"   required="false"/>
  <field name="roomsCount"    type="int"      indexed="true"    stored="true"   required="false"/>
  <field name="area"          type="int"      indexed="true"    stored="true"   required="false"/>

  <field name="dateUpdated"   type="tdate"    indexed="true"    stored="true"   required="true"/>
  <field name="dateClosed"    type="tdate"    indexed="true"    stored="true"   required="false"/>
  <field name="m2priceRel"    type="float"    indexed="true"    stored="true"   required="false"/>
  <field name="ceddData"      type="binary"   indexed="false"   stored="true"   required="false"  multiValued="true"/>
  <field name="imagesCount"   type="int"      indexed="true"    stored="true"   required="true"/>
  <field name="uniqAdTexts"   type="string"   indexed="false"   stored="true"   required="true"   multiValued="true"/>
</schema>

最大指標:
ru_ads:2.99GB
ru_objects:3.25gb
ua_ads:5.45GB
ua_objects:2.36gb
其他核心指標相對較小

運行時間太長的查詢(從客戶端“太長”)看起來像這樣(從SOLR日志中獲取,“ ????”只是非英語字母)

400723188 [qtp64700533-40547] INFO  org.apache.solr.core.SolrCore  ? [ua-objects] webapp=/solr path=/select params={mm=2&fl=*&start=0&q=(??????\+????????\+???????\+????????)+AND+type:3+AND+regionId:2+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[2+TO+2])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[40+TO+60])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[23500+TO+70500])+AND+dateUpdated:[2014-12-09T10:23:07Z+TO+2015-01-28T10:23:07Z]+AND+-objectId:(27824841)&qf=address^20+title^2&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=18 status=0 QTime=287

401989528 [qtp64700533-40830] INFO  org.apache.solr.core.SolrCore  ? [ru-objects] webapp=/solr path=/select params={mm=2&fl=*&start=0&q=(?????????????\+??????)+AND+type:4+AND+regionId:162+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[1+TO+1])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[40+TO+58])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[9+TO+27])+AND+dateUpdated:[2014-12-09T10:44:08Z+TO+2015-01-28T10:44:08Z]+AND+-objectId:(26415616)&qf=address^20+title^2&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=820 status=0 QTime=5755

400832723 [qtp64700533-40322] INFO  org.apache.solr.core.SolrCore  ? [ru-objects] webapp=/solr path=/select params={mm=2&fl=*&start=0&q=(????????\+???????)+AND+type:4+AND+regionId:102+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[1+TO+1])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[31+TO+45])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[115+TO+343])+AND+dateUpdated:[2014-12-09T10:24:57Z+TO+2015-01-28T10:24:57Z]+AND+-objectId:(26415342)&qf=address^20+title^2&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=9 status=0 QTime=372

402069370 [qtp64700533-40832] INFO  org.apache.solr.core.SolrCore  ? [ru-objects] webapp=/solr path=/select params={mm=1&fl=*&start=0&q=(????????\+?????????\+??\+????????)+AND+type:3+AND+regionId:135+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[1+TO+1])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[28+TO+40])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[9529+TO+28585])+AND+dateUpdated:[2014-10-30T10:45:33Z+TO+2015-01-28T10:45:33Z]+AND+-objectId:(26415855)&qf=address^20+title^2+text&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=14075 status=0 QTime=544

401805198 [qtp64700533-40233] INFO  org.apache.solr.core.SolrCore  ? [ua-objects] webapp=/solr path=/select params={mm=2&fl=*&start=0&q=(??????\+??\+??????\+?????\+??????????)+AND+type:3+AND+regionId:16+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[3+TO+3])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[93+TO+95])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[284050+TO+313950])+AND+dateUpdated:[2015-01-08T10:41:09Z+TO+2015-01-28T10:41:09Z]+AND+-objectId:(27826334)&qf=address^20+title^2&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=6 status=0 QTime=462

這是jvisualvm的最新概要分析屏幕截圖 屏幕截圖表格jvisualvm

“ top”命令的一部分,延遲= 10秒 最高輸出

您在每個查詢中都給定了參數rows=2147483647 該參數的含義是(引用自)

您可以使用rows參數對查詢結果進行分頁。 該參數指定完整結果集中Solr一次應返回給客戶端的最大文檔數。

默認值為10。即,默認情況下,Solr一次返回10個文檔以響應查詢。

因此,您要告訴Solr實際上是在單個響應中發送針對查詢找到的所有匹配。 這就是您表現不佳的原因。

Google是否會向您發送在查詢“ java”時找到的所有500.000.000個匹配項,否。 為什么不呢? 我知道的每個IR應用程序都會為您提供一小頁的第一頁結果,從而使搜索效果良好。

這是I / O較高的原因,solr從磁盤中獲取記錄並將它們寫入響應中。 這是I / O,僅此而已。

由於您正在使用它進行分析,並且想提取所有匹配項,因此您應該研究新的流導出功能。 不幸的是,它僅在Solr 4.10中可用。

您還可以更新到SSD-這對於提高Solr性能非常有利。

最后,查看您的緩存級別。 如果您不經常更新並且某些緩存已滿,則可以增加默認值。 如果您確實經常更新,那么它就不會像在提交時使緩存失效一樣沒有好處。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM