简体   繁体   English

Solr 4-索引发布的文本文件

[英]Solr 4 - Indexing posted text file

I'm trying to create a field called "sku" - which is indexed with the following analyzer: 我正在尝试创建一个名为“ sku”的字段-使用以下分析器对其进行索引:

<fieldType name="sku" class="solr.TextField">
   <analyzer>
  <tokenizer class="solr.PatternTokenizerFactory" pattern="(SKU|Part(\sNumber)?):?\s(\[0-9-\]+)" group="3"/>
</analyzer>
 </fieldType>

This is from reading the documentation here http://lucidworks.lucidimagination.com/display/solr/Tokenizers#Tokenizers-RegularExpressionPatternTokenizer 这是从此处的文档中阅读的: http://lucidworks.lucidimagination.com/display/solr/Tokenizers#Tokenizers-RegularExpressionPatternTokenizer

I already have a Java program that is posting to the solr server succesfully, however it is not grabbing the sku out of any files, and indexing them. 我已经有一个Java程序可以成功地发布到solr服务器,但是它没有从任何文件中获取sku并对其进行索引。 Here is my Java code: 这是我的Java代码:

ContentStreamUpdateRequest up = new ContentStreamUpdateRequest(
                "/update/extract");
        up.addFile(arg0, arg0.getName());

        up.setParam("literal.id", arg0.getName());
        up.setParam("uprefix", "attr_");
        up.setParam("fmap.content", "attr_content");

        up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);

        server.request(up);

Any help appreciated. 任何帮助表示赞赏。

I understand I can parse the text files myself and extract the SKU and post them in the parameters to the server, but I thought Solr could do this for me? 我知道我可以自己解析文本文件并提取SKU并将其发布到参数中到服务器,但是我认为Solr可以为我做到这一点吗?

It is hard to tell what is going on, because there is a several steps in the middle. 很难说是怎么回事,因为中间有几个步骤。

For example, what's your schema.xml definition. 例如,您的schema.xml定义是什么。 Is it definitely using sku as its type (and not say string). 是否肯定使用sku作为其类型(而不是字符串)。 Then, what's the field name (attr_sku?) and does the extract handler mapping actually maps to it properly? 然后,字段名称(attr_sku?)是什么,提取处理程序映射实际上是否正确映射到它? The extract handler usually sends metadata as individual fields and then all file content as one big long field. 提取处理程序通常将元数据作为单独的字段发送,然后将所有文件内容作为一个大的长字段发送。 Is sku somewhere in metadata? sku是否在元数据中?

I would do a copyField into something non-processing and see whether the content actually makes it into Solr field. 我将对非处理对象执行一个copyField,然后查看内容是否真正使其进入Solr字段。 Then, I would start troubleshooting the regex itself. 然后,我将开始对正则表达式本身进行故障排除。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM