簡體   English   中英

分析后如何獲取Lucene文檔字段令牌的條款?

[英]How can I get the terms of a Lucene document field tokens after they are analyzed?

我正在使用Lucene 5.1.0。 在對文檔進行分析並建立索引之后,我想獲得屬於該特定文檔的所有已索引術語的列表。

{        
        File[] files = FILES_TO_INDEX_DIRECTORY.listFiles();
        for (File file : files) {
            Document document = new Document();
            Reader reader = new FileReader(file);
            document.add(new TextField("fieldname",reader));            
            iwriter.addDocument(document);
        }  

        iwriter.close();
        IndexReader indexReader = DirectoryReader.open(directory);
        int maxDoc=indexReader.maxDoc();
        for (int i=0; i < maxDoc; i++) {
            Document doc=indexReader.document(i);
            String[] terms = doc.getValues("fieldname");
        }
}

條款返回null。 有沒有辦法獲取每個文檔保存的條款?

這是使用TokenStream的答案的示例代碼

 TokenStream ts= analyzer.tokenStream("myfield", reader);
            // The Analyzer class will construct the Tokenizer, TokenFilter(s), and CharFilter(s),
            //   and pass the resulting Reader to the Tokenizer.
            OffsetAttribute offsetAtt = ts.addAttribute(OffsetAttribute.class);
            CharTermAttribute charTermAttribute = ts.addAttribute(CharTermAttribute.class);

            try {
                ts.reset(); // Resets this stream to the beginning. (Required)
                while (ts.incrementToken()) {
                    // Use AttributeSource.reflectAsString(boolean)
                    // for token stream debugging.
                    System.out.println("token: " + ts.reflectAsString(true));
                    String term = charTermAttribute.toString();
                    System.out.println(term);

                }
                ts.end();   // Perform end-of-stream operations, e.g. set the final offset.
            } finally {
                ts.close(); // Release resources associated with this stream.
            }

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM