简体   繁体   English

发布到Bluemix Retrieve_and_Rank的状态为0,但不起作用

[英]Post to Bluemix Retrieve_and_Rank gives status 0, but does not work

I am trying to index some web pages in Bluemix Retrieve and Rank service. 我正在尝试在Bluemix Retrieve和Rank服务中为某些网页编制索引。 So I did crawled my seeds with nutch 1.11, dumped the crawled data(about 9000 URLs) as files, posted those that are possible eg xml files to my Collection: 所以我确实用1.11抓取了我的种子,将抓取的数据(大约9000个URL)作为文件转储,并将可能的文件(例如xml文件)发布到了我的收藏夹:

Post_url = '"https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/solr_clusters/%s/solr/%s/update"' %(solr_cluster_id, solr_collection_name)
cmd ='''curl -X POST -H %s -u %s %s --data-binary @%s''' %(Cont_type_xml, solr_credentials, Post_url, myfilename)
subprocess.call(cmd,shell=True)

and converted the rest to json with Bluemix Doc-Conv service: 并使用Bluemix Doc-Conv服务将其余部分转换为json:

doc_conv_url = '"https://gateway.watsonplatform.net/document-conversion/api/v1/convert_document?version=2015-12-15"'
cmd ='''curl -X POST -u %s -F config="{\\"conversion_target\\":\\"answer_units\\"}" -F file=@%s %s''' %(doc_conv_credentials, myfilename, doc_conv_url)
process = subprocess.Popen(cmd, shell= True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

and then saved these Json results in a json file and posted it to my collection: 然后将这些Json结果保存在json文件中,并将其发布到我的收藏夹中:

Post_converted_url = '"https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/solr_clusters/%s/solr/%s/update/json/docs?commit=true&split=/answer_units/id&f=id:/answer_units/id&f=title:/answer_units/title&f=body:/answer_units/content/text"' %(solr_cluster_id, solr_collection_name)
cmd ='''curl -X POST -H %s -u %s %s --data-binary @%s''' %(Cont_type_json, solr_credentials, Post_converted_url, Path_jsonFile)
subprocess.call(cmd,shell=True)

Everything sounds to be done OK. 一切听起来都OK。 The json file is as it should be and when I post the data I do receive the Status 0, which I Thought means the posting was done correctly. json文件是应该的,当我发布数据时,我确实收到状态0,我认为这意味着发布已正确完成。 But when I send Queries: 但是当我发送查询时:

pysolr_client = retrieve_and_rank.get_pysolr_client(solr_cluster_id, solr_collection_name)
results = pysolr_client.search(Query_term)
print(results.docs)

the result is nothing. 结果什么都没有。 It finds nothing. 它什么也没找到。 I have done the same before, with the same commands' structure and everything, and it worked. 我以前用相同的命令结构和所有操作完成了相同的工作,并且它确实有效。 I just made a new collection and now it doesn't work. 我刚刚收集了一个新收藏,但现在不起作用。

Has my data been indexed? 我的数据被索引了吗? Then Why the query does not work? 那为什么查询不起作用? When I try getting usage statistics for my Solr cluster the result is: 当我尝试获取Solr集群的使用情况统计信息时,结果是:

{"disk_usage":{"used_bytes":2210,"total_bytes":34359738368,"used":"2.1582 KB","total":"32 GB","percent_used":6.4319465309381485E-6}, {“ disk_usage”:{“ used_bytes”:2210,“ total_bytes”:34359738368,“ used”:“ 2.1582 KB”,“ total”:“ 32 GB”,“ percent_used”:6.4319465309381485E-6},

"memory_usage":{"used_bytes":2069028864,"total_bytes":4194304000,"used":"1.9269 GB","total":"3.9063 GB","percent_used":49.3294921875}} “ memory_usage”:{“ used_bytes”:2069028864,“ total_bytes”:4194304000,“ used”:“ 1.9269 GB”,“ total”:“ 3.9063 GB”,“ percent_used”:49.3294921875}}

which I thought means my data has been indexed and is stored in my cluster. 我认为这意味着我的数据已被索引并存储在群集中。 Just now I realized that every time that I post my data the data usage and memory usage does not change. 刚才我意识到,每次发布数据时,数据使用量和内存使用量都不会改变。 does it mean the Posting is not done? 这是否意味着发布未完成? even though I receive Status 0? 即使我收到状态0? If yes any ideas on what the problem is? 如果是,对什么是问题有任何想法吗? why is it happening? 为什么会这样呢?

Does it has anything to do with the solr_config? 它与solr_config有关系吗?
Any helps or ideas on how to get the result from a query would be highly appreciated. 对于如何从查询中获取结果的任何帮助或想法,将受到高度赞赏。

The URL used for posting the converted files have to split the data by /answer_units/ not by /answer_units/id so it should be : 用于发布转换后的文件的URL必须按/ answer_units /而不是/ answer_units / id拆分数据,因此应为:

Post_converted_url = '" https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/solr_clusters/%s/solr/%s/update/json/docs?commit=true&split=/answer_units&f=id:/answer_units/id&f=title:/answer_units/title&f=body:/answer_units/content/text "' %(solr_cluster_id, solr_collection_name) Post_converted_url ='“ https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/solr_clusters/%s/solr/%s/update/json/docs?commit=true&split=/answer_units&f=id:/ answer_units / id&f = title:/ answer_units / title&f = body:/ answer_units / content / text “'%(solr_cluster_id,solr_collection_name)

Pay atention to the split=/answer_units part. 注意split = / answer_units部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM