Searching issue using lucene.Net

Question

I'm trying to your the search engine of Lucene .Net. i followed some documentation on the website but i might have missed something since it doesn't work as expected ..

Here is the code :

var stringBuilder = new StringBuilder();
        var pdfReader = new PdfReader(@"c:\Test\testRoot.pdf");
        for (var page = 1; page <= pdfReader.NumberOfPages; page++)
        {
            stringBuilder.Append(PdfTextExtractor.GetTextFromPage(pdfReader, page) + " ");
        }
        if (stringBuilder.ToString().Contains("new"))
        {
            Console.WriteLine("New is present in the text!");
        }
        const string strIndexDir = @"C:\Index";
        Directory indexDir = FSDirectory.Open(strIndexDir);
        Analyzer std = new StandardAnalyzer(Version.LUCENE_29);
        var idwx = new IndexWriter(indexDir, std, true, IndexWriter.MaxFieldLength.UNLIMITED);
        var doc = new Document();            
        var fdl = new Field("testRoot", stringBuilder.ToString(), Field.Store.YES, Field.Index.ANALYZED);
        doc.Add(fdl);
        idwx.AddDocument(doc);
        idwx.Optimize();
        idwx.Dispose();
        Console.WriteLine("Indexing Done !");


        var parser = new QueryParser(Version.LUCENE_29, "new", std);
        var qry = parser.Parse(parser.Field);
        Directory directory = FSDirectory.Open(new System.IO.DirectoryInfo(strIndexDir));
        Searcher srch = new IndexSearcher(IndexReader.Open(directory, true));
        TopScoreDocCollector cllstr = TopScoreDocCollector.Create(100, true);
        ScoreDoc[] hits = cllstr.TopDocs().ScoreDocs;
        for (int i = 0; i < hits.Length; i++)
        {
            int docId = hits[i].Doc;
            float score = hits[i].Score;
            Document docy = srch.Doc(docId);
            Console.WriteLine(docy.Get("text"));
        }
        Console.ReadLine();

The thing is That the word new is present in the text of my PDF since it goes in the 'if'.

but at the end, when i try to look for the match, nothing's here ..

EDIT:

i made few changes but still doesn't work:

var stringBuilder = new StringBuilder();
        var pdfReader = new PdfReader(@"c:\Test\testRoot.pdf");
        for (var page = 1; page <= pdfReader.NumberOfPages; page++)
        {
            stringBuilder.Append(PdfTextExtractor.GetTextFromPage(pdfReader, page) + " ");
        }
        if (stringBuilder.ToString().Contains("new"))
        {
            Console.WriteLine("New is present in the text!");
        }
        const string strIndexDir = @"C:\Index";
        Directory indexDir = FSDirectory.Open(strIndexDir);
        Analyzer std = new StandardAnalyzer(Version.LUCENE_29);
        var idwx = new IndexWriter(indexDir, std, true, IndexWriter.MaxFieldLength.UNLIMITED);
        var doc = new Document();            
        var fdl = new Field("testRoot", stringBuilder.ToString(), Field.Store.YES, Field.Index.ANALYZED);
        doc.Add(fdl);
        idwx.AddDocument(doc);
        idwx.Optimize();
        idwx.Commit();
        idwx.Dispose();

        Console.WriteLine("Indexing Done !");
        var parser = new QueryParser(Version.LUCENE_29, "", std);
        var qry = parser.Parse("new*");
        Directory directory = FSDirectory.Open(new System.IO.DirectoryInfo(strIndexDir));
        Searcher srch = new IndexSearcher(IndexReader.Open(directory, true));
        var lol = srch.Search(qry, 100);
        ScoreDoc[] hits = lol.ScoreDocs;
        for (int i = 0; i < hits.Length; i++)
        {
            int docId = hits[i].Doc;
            float score = hits[i].Score;
            Document docy = srch.Doc(docId);
            Console.WriteLine(docy.Get("testRoot"));
        }

Thanks you for helping :)

Answer 1

Try either:

var parser = new QueryParser(Version.LUCENE_29, "testRoot", std);

Or:

var qry = parser.Parse("testRoot:new*");

You need to specify the correct field to search in. It appears the testRoot is the field name you are looking for. The second argument to QueryParser specifies the default field to search. In the first example shown you provided, you call it "new", which doesn't appear to be the name of a field being added to your document (essentially, in that case, your query looks like: new:new ). This default field will be used for searching unless you specify the field to search in your query, such as myField:findThis (see the query parser syntax ).

Searching issue using lucene.Net

Question

1 answers

solution1
1 ACCPTED 2014-09-15 17:58:19

Searching issue using lucene.Net

Question

1 answers

solution1 1 ACCPTED 2014-09-15 17:58:19

solution1
1 ACCPTED 2014-09-15 17:58:19