繁体   English   中英

如何在Lucene 4.10.1中索引大文本字段

[英]How to index big text fields in Lucene 4.10.1

我正在Lucene(4.10.1版)中迈出第一步,目前的目标是从一个100KB大的文件中索引文本字段。 由于文本不适合字符串,因此将文件中的文本放入字节数组。 但是,当我运行程序Lucene时Fields with BytesRef values cannot be indexed

所以问题是:如何索引大文本字段?

这是代码:

public class Main {

    public static void main(String[] args) {

        try {
            Directory indexDir = FSDirectory.open(new File("testIndex"));
            Analyzer analyzer = new StandardAnalyzer();
            IndexWriterConfig conf = new IndexWriterConfig(Version.LUCENE_4_10_1, analyzer);
            IndexWriter indexWriter = new IndexWriter(indexDir, conf);
            Path path = Paths.get("text.txt");
            byte[] text = Files.readAllBytes(path);

            Long startTime = System.currentTimeMillis();
            for(int i = 0;i<100;i++) {
                Document doc = new Document();
                FieldType fieldType = new FieldType();
                fieldType.setIndexed(true);
                fieldType.setTokenized(true);
                fieldType.setStored(true);
                fieldType.setOmitNorms(true);
                fieldType.setStoreTermVectors(false);
                fieldType.setStoreTermVectorOffsets(false);
                fieldType.setStoreTermVectorPayloads(false);
                fieldType.setStoreTermVectorPositions(false);
                Field title = new Field("text"+i, text, fieldType);

                doc.add(title);

                indexWriter.addDocument(doc);
            }
            Long endTime = System.currentTimeMillis();
            Long elapsedTime = endTime - startTime;
            System.out.println("Elapsed Time in  Ms: "+elapsedTime);

            indexWriter.close();

        } catch (IOException e) {
            e.printStackTrace();
        }

    }

}

StringBuilder解决了。

码:

            Path path = Paths.get("text.txt");
            BufferedReader reader = Files.newBufferedReader(path, Charset.defaultCharset());
            StringBuilder stringBuilder = new StringBuilder();
            String line = null;
            while((line = reader.readLine()) != null) {
                stringBuilder.append(line).append("\n");
            }
            String text = stringBuilder.toString();

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM