[英]How to index big text fields in Lucene 4.10.1
我正在Lucene(4.10.1版)中迈出第一步,目前的目标是从一个100KB大的文件中索引文本字段。 由于文本不适合字符串,因此将文件中的文本放入字节数组。 但是,当我运行程序Lucene时Fields with BytesRef values cannot be indexed
。
所以问题是:如何索引大文本字段?
这是代码:
public class Main {
public static void main(String[] args) {
try {
Directory indexDir = FSDirectory.open(new File("testIndex"));
Analyzer analyzer = new StandardAnalyzer();
IndexWriterConfig conf = new IndexWriterConfig(Version.LUCENE_4_10_1, analyzer);
IndexWriter indexWriter = new IndexWriter(indexDir, conf);
Path path = Paths.get("text.txt");
byte[] text = Files.readAllBytes(path);
Long startTime = System.currentTimeMillis();
for(int i = 0;i<100;i++) {
Document doc = new Document();
FieldType fieldType = new FieldType();
fieldType.setIndexed(true);
fieldType.setTokenized(true);
fieldType.setStored(true);
fieldType.setOmitNorms(true);
fieldType.setStoreTermVectors(false);
fieldType.setStoreTermVectorOffsets(false);
fieldType.setStoreTermVectorPayloads(false);
fieldType.setStoreTermVectorPositions(false);
Field title = new Field("text"+i, text, fieldType);
doc.add(title);
indexWriter.addDocument(doc);
}
Long endTime = System.currentTimeMillis();
Long elapsedTime = endTime - startTime;
System.out.println("Elapsed Time in Ms: "+elapsedTime);
indexWriter.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
用StringBuilder
解决了。
码:
Path path = Paths.get("text.txt");
BufferedReader reader = Files.newBufferedReader(path, Charset.defaultCharset());
StringBuilder stringBuilder = new StringBuilder();
String line = null;
while((line = reader.readLine()) != null) {
stringBuilder.append(line).append("\n");
}
String text = stringBuilder.toString();
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.