简体   繁体   English

在solr中从googleapi和twitterapi索引Json和Atom

[英]indexing Json and Atom from googleapi and twitterapi in solr

I am an intern building a search engine for my company. 我是为我的公司构建搜索引擎的实习生。 This search engine should search for data using different APIs in addition to a web crawler and then index the returned data. 除了网络搜寻器之外,该搜索引擎还应使用其他API搜索数据,然后为返回的数据建立索引。 I thought about using solr to index this returned data. 我考虑过使用solr索引此返回的数据。

I would first want your advice on whether it is a good idea. 首先,我想问一下您的建议是否可行。 I also want to know if I would encounter issues in regards to indexing JSON and Atom, as I do not know the name of the tags in advanced. 我还想知道是否在索引JSON和Atom方面遇到问题,因为我不知道高级标签的名称。

Thank you 谢谢

Please go ahead as you are proceeding in the right direction. 请按照正确的方向继续前进。 Answer to the second part of your question is Yes you would encounter problems while indexing, like schema issues,Indexing Nested jsons, etc. and these issues can be resolved using plug ins or Data Import Handlers (DIH). 问题第二部分的答案是“是”,在索引时会遇到问题,例如架构问题,对嵌套jsons进行索引等,并且可以使用插件或数据导入处理程序(DIH)解决这些问题。

First of all, you can index atom and json data using solr. 首先,您可以使用solr为atom和json数据建立索引。 There are two ways to do that: 有两种方法可以做到这一点:

1) parse the data and map each field of the parsed data to a field in solr. 1)解析数据并将解析数据的每个字段映射到solr中的字段。 2) do not parse the data but rather give whole files to Apache Tika (that would do the job). 2)不解析数据,而是将整个文件提供给Apache Tika(那样可以完成工作)。 A way to do that is to save the data in a file and index the file using update/extract. 一种方法是将数据保存在文件中,并使用更新/提取为文件建立索引。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM