发布到Bluemix Retrieve_and_Rank的状态为0，但不起作用

Question

I am trying to index some web pages in Bluemix Retrieve and Rank service. 我正在尝试在Bluemix Retrieve和Rank服务中为某些网页编制索引。 So I did crawled my seeds with nutch 1.11, dumped the crawled data(about 9000 URLs) as files, posted those that are possible eg xml files to my Collection: 所以我确实用1.11抓取了我的种子，将抓取的数据（大约9000个URL）作为文件转储，并将可能的文件（例如xml文件）发布到了我的收藏夹：

Post_url = '"https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/solr_clusters/%s/solr/%s/update"' %(solr_cluster_id, solr_collection_name)
cmd ='''curl -X POST -H %s -u %s %s --data-binary @%s''' %(Cont_type_xml, solr_credentials, Post_url, myfilename)
subprocess.call(cmd,shell=True)

and converted the rest to json with Bluemix Doc-Conv service: 并使用Bluemix Doc-Conv服务将其余部分转换为json：

doc_conv_url = '"https://gateway.watsonplatform.net/document-conversion/api/v1/convert_document?version=2015-12-15"'
cmd ='''curl -X POST -u %s -F config="{\\"conversion_target\\":\\"answer_units\\"}" -F file=@%s %s''' %(doc_conv_credentials, myfilename, doc_conv_url)
process = subprocess.Popen(cmd, shell= True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

and then saved these Json results in a json file and posted it to my collection: 然后将这些Json结果保存在json文件中，并将其发布到我的收藏夹中：

Post_converted_url = '"https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/solr_clusters/%s/solr/%s/update/json/docs?commit=true&split=/answer_units/id&f=id:/answer_units/id&f=title:/answer_units/title&f=body:/answer_units/content/text"' %(solr_cluster_id, solr_collection_name)
cmd ='''curl -X POST -H %s -u %s %s --data-binary @%s''' %(Cont_type_json, solr_credentials, Post_converted_url, Path_jsonFile)
subprocess.call(cmd,shell=True)

Everything sounds to be done OK. 一切听起来都OK。 The json file is as it should be and when I post the data I do receive the Status 0, which I Thought means the posting was done correctly. json文件是应该的，当我发布数据时，我确实收到状态0，我认为这意味着发布已正确完成。 But when I send Queries: 但是当我发送查询时：

pysolr_client = retrieve_and_rank.get_pysolr_client(solr_cluster_id, solr_collection_name)
results = pysolr_client.search(Query_term)
print(results.docs)

the result is nothing. 结果什么都没有。 It finds nothing. 它什么也没找到。 I have done the same before, with the same commands' structure and everything, and it worked. 我以前用相同的命令结构和所有操作完成了相同的工作，并且它确实有效。 I just made a new collection and now it doesn't work. 我刚刚收集了一个新收藏，但现在不起作用。

Has my data been indexed? 我的数据被索引了吗？ Then Why the query does not work? 那为什么查询不起作用？ When I try getting usage statistics for my Solr cluster the result is: 当我尝试获取Solr集群的使用情况统计信息时，结果是：

{"disk_usage":{"used_bytes":2210,"total_bytes":34359738368,"used":"2.1582 KB","total":"32 GB","percent_used":6.4319465309381485E-6}, {“ disk_usage”：{“ used_bytes”：2210，“ total_bytes”：34359738368，“ used”：“ 2.1582 KB”，“ total”：“ 32 GB”，“ percent_used”：6.4319465309381485E-6}，

"memory_usage":{"used_bytes":2069028864,"total_bytes":4194304000,"used":"1.9269 GB","total":"3.9063 GB","percent_used":49.3294921875}} “ memory_usage”：{“ used_bytes”：2069028864，“ total_bytes”：4194304000，“ used”：“ 1.9269 GB”，“ total”：“ 3.9063 GB”，“ percent_used”：49.3294921875}}

which I thought means my data has been indexed and is stored in my cluster. 我认为这意味着我的数据已被索引并存储在群集中。 Just now I realized that every time that I post my data the data usage and memory usage does not change. 刚才我意识到，每次发布数据时，数据使用量和内存使用量都不会改变。 does it mean the Posting is not done? 这是否意味着发布未完成？ even though I receive Status 0? 即使我收到状态0？ If yes any ideas on what the problem is? 如果是，对什么是问题有任何想法吗？ why is it happening? 为什么会这样呢？

Does it has anything to do with the solr_config? 它与solr_config有关系吗？
Any helps or ideas on how to get the result from a query would be highly appreciated. 对于如何从查询中获取结果的任何帮助或想法，将受到高度赞赏。

Answer 1

The URL used for posting the converted files have to split the data by /answer_units/ not by /answer_units/id so it should be : 用于发布转换后的文件的URL必须按/ answer_units /而不是/ answer_units / id拆分数据，因此应为：

Post_converted_url = '" https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/solr_clusters/%s/solr/%s/update/json/docs?commit=true&split=/answer_units&f=id:/answer_units/id&f=title:/answer_units/title&f=body:/answer_units/content/text "' %(solr_cluster_id, solr_collection_name) Post_converted_url ='“ https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/solr_clusters/%s/solr/%s/update/json/docs?commit=true&split=/answer_units&f=id:/ answer_units / id＆f = title：/ answer_units / title＆f = body：/ answer_units / content / text “'％（solr_cluster_id，solr_collection_name）

Pay atention to the split=/answer_units part. 注意split = / answer_units部分。

发布到Bluemix Retrieve_and_Rank的状态为0，但不起作用

问题描述

1 个解决方案

解决方案1
0 已采纳 2016-07-21 21:39:56

发布到Bluemix Retrieve_and_Rank的状态为0，但不起作用

问题描述

1 个解决方案

解决方案1 0 已采纳 2016-07-21 21:39:56

解决方案1
0 已采纳 2016-07-21 21:39:56