Lucene changing from RAMDirectory to FSDIrectory - Content-Field missing

Question

I'm just a lucene starter and and i got stuck on a problem during a change from a RAMDIrectory to a FSDirectory:

First my code:

    private static IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_43,
            new StandardAnalyzer(Version.LUCENE_43));
    Directory DIR = FSDirectory.open(new File(INDEXLOC)); //INDEXLOC = "path/to/dir/"
    // RAMDirectory DIR = new RAMDirectory();

    // Index some made up content      
    IndexWriter writer =
            new IndexWriter(DIR, iwc);


    // Store both position and offset information
    FieldType type = new FieldType();
    type.setStored(true);
    type.setStoreTermVectors(true);
    type.setStoreTermVectorOffsets(true);
    type.setStoreTermVectorPositions(true);
    type.setIndexed(true);
    type.setTokenized(true);

    IDocumentParser p = DocumentParserFactory.getParser(f);
    ArrayList<ParserDocument> DOCS = p.getParsedDocuments();

    for (int i = 0; i < DOCS.size(); i++) {
        Document doc = new Document();
        Field id = new StringField("id", "doc_" + i, Field.Store.YES);
        doc.add(id);
        Field text = new Field("content", DOCS.get(i).getContent(), type);
        doc.add(text);
        writer.addDocument(doc);
    }
    writer.close();
    // Get a searcher
    IndexSearcher searcher = new IndexSearcher(DirectoryReader.open(DIR));
    // Do a search using SpanQuery
    SpanTermQuery fleeceQ = new SpanTermQuery(new Term("content", "zahl"));
    TopDocs results = searcher.search(fleeceQ, 10);
    for (int i = 0; i < results.scoreDocs.length; i++) {
        ScoreDoc scoreDoc = results.scoreDocs[i];
        System.out.println("Score Doc: " + scoreDoc);
    }
    IndexReader reader = searcher.getIndexReader();

    AtomicReader wrapper = SlowCompositeReaderWrapper.wrap(reader);
    Map<Term, TermContext> termContexts = new HashMap<Term, TermContext>();
    Spans spans = fleeceQ.getSpans(wrapper.getContext(), new Bits.MatchAllBits(reader.numDocs()), termContexts);
    int window = 2;// get the words within two of the match
    while (spans.next() == true) {
        Map<Integer, String> entries = new TreeMap<Integer, String>();
        System.out.println("Doc: " + spans.doc() + " Start: " + spans.start() + " End: " + spans.end());
        int start = spans.start() - window;
        int end = spans.end() + window;
        Terms content = reader.getTermVector(spans.doc(), "content");
        TermsEnum termsEnum = content.iterator(null);
        BytesRef term;
        while ((term = termsEnum.next()) != null) {
            // could store the BytesRef here, but String is easier for this
            // example
            String s = new String(term.bytes, term.offset, term.length);
            DocsAndPositionsEnum positionsEnum = termsEnum.docsAndPositions(null, null);
            if (positionsEnum.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) {
                int i = 0;
                int position = -1;
                while (i < positionsEnum.freq() && (position = positionsEnum.nextPosition()) != -1) {
                    if (position >= start && position <= end) {
                        entries.put(position, s);
                    }
                    i++;
                }
            }
        }
        System.out.println("Entries:" + entries);
    }

it's just some code i found on a great website and i wanted to try .... everything works great using the RAMDirectory. But if i change it to my FSDirectory it's giving me a NullpointerException like :

Exception in thread "main" java.lang.NullPointerException at com.org.test.TextDB.myMethod(TextDB.java:184) at com.org.test.Main.main(Main.java:31)

The statement Terms content = reader.getTermVector(spans.doc(), "content"); seems to get no result and returns null. so the exception. but why? in my ramDIR everything works fine.

It seems that the indexWriter or the Reader (really don't know) didn't write or didn't read the field "content" properly from the index. But i really don't know why its 'written' in a RAMDirectory and not written in a FSDIrectory?!

Anybody an idea to that?

Answer 1

Gave this a test a quick test run, and I can't reproduce your issue.

I think the most likely issue here is old documents in your index. The way this is written, every time it is run, more documents will be added to your index. Old documents from previous runs won't get deleted, or overwritten, they'll just stick around. So, if you have run this before on the same directory, say perhaps, before you added the line type.setStoreTermVectors(true); , some of your results may be these old documents with term vectors, and reader.getTermVector(...) will return null, if the document does not store term vectors.

Of course, anything indexed in a RAMDirectory will be dropped as soon as execution finishes, so the issue would not occur in that case.

Simple solution would be to try deleting the index directory and run it again.

If you want to start with a fresh index when you run this, you can set that up through the IndexWriterConfig :

private static IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_43,
        new StandardAnalyzer(Version.LUCENE_43));
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE);

That's a guess, of course, but seems consistent with the behavior you've described.

Lucene changing from RAMDirectory to FSDIrectory - Content-Field missing

Question

1 answers

solution1
1 ACCPTED 2014-10-27 21:17:08

Lucene changing from RAMDirectory to FSDIrectory - Content-Field missing

Question

1 answers

solution1 1 ACCPTED 2014-10-27 21:17:08

solution1
1 ACCPTED 2014-10-27 21:17:08