如何查看使用Solr通過螺母抓取的數據？

Question

我是Nutch和Solr的新手。 因此，如果我要提出基本問題，我事先表示歉意。

環境詳細信息：

帶有來賓操作系統的虛擬機：Ubuntu 12.04.4，主機操作系統：Windows 8
Nutch版本：Apache nutch 1.7
Solr發行版：Apache Solr 3.6.2
參考wiki.apache.org/nutch/NutchTutorial

我開始使用Command-進行抓取

bin/nutch crawl urls -solr http://<code>mylocalhost<code>:8983/solr/ -depth 3 -topN 5

該命令成功執行，沒有錯誤。

之后，我在瀏覽器中打開了solr admin頁面，並嘗試使用默認查詢字符串\\*:* 。 但是，這導致頁面包含以下內容：

This XML file does not appear to have any style information associated with it. The document tree is shown below.
<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">0</int>
        <lst name="params">
            <str name="start">0</str>
            <str name="q">*:*</str>
            <str name="rows">10</str>
            <str name="indent">on</str>
            <str name="version">2.2</str>
        </lst>
    </lst>
    <result name="response" numFound="0" start="0"/>
</response>

當我嘗試在solr中搜索“ nutch”時，它導致了錯誤：“ HTTP錯誤400”。

您能否幫助我查看數據是否被抓取，以便我可以對其進行驗證。

Answer 1

驗證數據的最簡單方法聽起來像是您要嘗試的操作：查詢數據並確保其返回預期結果。 那里有一些幫助：

當您說您嘗試了基本查詢字符串時，是指通過solr管理員還是通過其余API？ 如果您使用的是solr admin，則無需先轉義*。 因此，q將是：直接。 在Rest API中，*必須正確編碼。 像這樣：

http://your_host_name:8888/solr/your_core_name/select?q=*%3A*&wt=json&indent=true

您可以做的另一件事是驗證nutch的某些中間數據是使用bin / nutch命令readdb，readlinkdb，mergedb轉儲爬網或鏈接數據庫。

如何查看使用Solr通過螺母抓取的數據？

問題描述

1 個解決方案

解決方案1
0 2014-05-08 09:33:57

如何查看使用Solr通過螺母抓取的數據？

問題描述

1 個解決方案

解決方案1 0 2014-05-08 09:33:57

解決方案1
0 2014-05-08 09:33:57