简体   繁体   English

分析后如何获取Lucene文档字段令牌的条款?

[英]How can I get the terms of a Lucene document field tokens after they are analyzed?

I'm using Lucene 5.1.0. 我正在使用Lucene 5.1.0。 After Analyzing and indexing a document, I would like to get a list of all the terms indexed that belong to this specific document. 在对文档进行分析并建立索引之后,我想获得属于该特定文档的所有已索引术语的列表。

{        
        File[] files = FILES_TO_INDEX_DIRECTORY.listFiles();
        for (File file : files) {
            Document document = new Document();
            Reader reader = new FileReader(file);
            document.add(new TextField("fieldname",reader));            
            iwriter.addDocument(document);
        }  

        iwriter.close();
        IndexReader indexReader = DirectoryReader.open(directory);
        int maxDoc=indexReader.maxDoc();
        for (int i=0; i < maxDoc; i++) {
            Document doc=indexReader.document(i);
            String[] terms = doc.getValues("fieldname");
        }
}

the terms return null. 条款返回null。 Is there a way to get the saved terms per document? 有没有办法获取每个文档保存的条款?

Here is a sample code for the answer, using a TokenStream 这是使用TokenStream的答案的示例代码

 TokenStream ts= analyzer.tokenStream("myfield", reader);
            // The Analyzer class will construct the Tokenizer, TokenFilter(s), and CharFilter(s),
            //   and pass the resulting Reader to the Tokenizer.
            OffsetAttribute offsetAtt = ts.addAttribute(OffsetAttribute.class);
            CharTermAttribute charTermAttribute = ts.addAttribute(CharTermAttribute.class);

            try {
                ts.reset(); // Resets this stream to the beginning. (Required)
                while (ts.incrementToken()) {
                    // Use AttributeSource.reflectAsString(boolean)
                    // for token stream debugging.
                    System.out.println("token: " + ts.reflectAsString(true));
                    String term = charTermAttribute.toString();
                    System.out.println(term);

                }
                ts.end();   // Perform end-of-stream operations, e.g. set the final offset.
            } finally {
                ts.close(); // Release resources associated with this stream.
            }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM