I am in a need to merge the Lucene indexes kept on HDFS. Wrote the customized version of the normal merge tool provided by Lucene. Code base is given below
HdfsDirectory mergedIndex = new HdfsDirectory(new Path("/mergedindex"), new Configuration());
IndexWriter writer = new IndexWriter(mergedIndex, new IndexWriterConfig(new WhitespaceAnalyzer(Version.LUCENE_CURRENT))
.setOpenMode(OpenMode.CREATE));
Directory[] indexes = new BaseDirectory[args.length - 1];
for (int i = 1; i < args.length; i++) {
indexes[i - 1] = new HdfsDirectory(new Path(args[i]), new Configuration());
}
System.out.println("Merging...");
writer.addIndexes(indexes);
System.out.println("Full merge...");
writer.forceMerge(1);
writer.close();
But it says it cannot get a HDFS lock on the directory because it is a timeout ! the time out value is hardcoded in the Lucene library as 1000 milli second.
Exception trace Exception in thread "main" org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.apache.solr.store.hdfs.HdfsLockFactory$HdfsLock@21539796 at org.apache.lucene.store.Lock.obtain(Lock.java:89) at org.apache.lucene.index.IndexWriter.(IndexWriter.java:776) at com.test.hadoop.solr.indexer.IndexMergeTool.main(IndexMergeTool.java:30) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Is there any mechanism to overcome this so that I can merge the index on HDFS itself?
Thanks in advance, Arun
请确保删除索引文件夹下的锁定文件并尝试。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.