简体   繁体   中英

Searching issue using lucene.Net

I'm trying to your the search engine of Lucene .Net. i followed some documentation on the website but i might have missed something since it doesn't work as expected ..

Here is the code :

var stringBuilder = new StringBuilder();
        var pdfReader = new PdfReader(@"c:\Test\testRoot.pdf");
        for (var page = 1; page <= pdfReader.NumberOfPages; page++)
        {
            stringBuilder.Append(PdfTextExtractor.GetTextFromPage(pdfReader, page) + " ");
        }
        if (stringBuilder.ToString().Contains("new"))
        {
            Console.WriteLine("New is present in the text!");
        }
        const string strIndexDir = @"C:\Index";
        Directory indexDir = FSDirectory.Open(strIndexDir);
        Analyzer std = new StandardAnalyzer(Version.LUCENE_29);
        var idwx = new IndexWriter(indexDir, std, true, IndexWriter.MaxFieldLength.UNLIMITED);
        var doc = new Document();            
        var fdl = new Field("testRoot", stringBuilder.ToString(), Field.Store.YES, Field.Index.ANALYZED);
        doc.Add(fdl);
        idwx.AddDocument(doc);
        idwx.Optimize();
        idwx.Dispose();
        Console.WriteLine("Indexing Done !");


        var parser = new QueryParser(Version.LUCENE_29, "new", std);
        var qry = parser.Parse(parser.Field);
        Directory directory = FSDirectory.Open(new System.IO.DirectoryInfo(strIndexDir));
        Searcher srch = new IndexSearcher(IndexReader.Open(directory, true));
        TopScoreDocCollector cllstr = TopScoreDocCollector.Create(100, true);
        ScoreDoc[] hits = cllstr.TopDocs().ScoreDocs;
        for (int i = 0; i < hits.Length; i++)
        {
            int docId = hits[i].Doc;
            float score = hits[i].Score;
            Document docy = srch.Doc(docId);
            Console.WriteLine(docy.Get("text"));
        }
        Console.ReadLine();

The thing is That the word new is present in the text of my PDF since it goes in the 'if'.

but at the end, when i try to look for the match, nothing's here ..

EDIT:

i made few changes but still doesn't work:

var stringBuilder = new StringBuilder();
        var pdfReader = new PdfReader(@"c:\Test\testRoot.pdf");
        for (var page = 1; page <= pdfReader.NumberOfPages; page++)
        {
            stringBuilder.Append(PdfTextExtractor.GetTextFromPage(pdfReader, page) + " ");
        }
        if (stringBuilder.ToString().Contains("new"))
        {
            Console.WriteLine("New is present in the text!");
        }
        const string strIndexDir = @"C:\Index";
        Directory indexDir = FSDirectory.Open(strIndexDir);
        Analyzer std = new StandardAnalyzer(Version.LUCENE_29);
        var idwx = new IndexWriter(indexDir, std, true, IndexWriter.MaxFieldLength.UNLIMITED);
        var doc = new Document();            
        var fdl = new Field("testRoot", stringBuilder.ToString(), Field.Store.YES, Field.Index.ANALYZED);
        doc.Add(fdl);
        idwx.AddDocument(doc);
        idwx.Optimize();
        idwx.Commit();
        idwx.Dispose();

        Console.WriteLine("Indexing Done !");
        var parser = new QueryParser(Version.LUCENE_29, "", std);
        var qry = parser.Parse("new*");
        Directory directory = FSDirectory.Open(new System.IO.DirectoryInfo(strIndexDir));
        Searcher srch = new IndexSearcher(IndexReader.Open(directory, true));
        var lol = srch.Search(qry, 100);
        ScoreDoc[] hits = lol.ScoreDocs;
        for (int i = 0; i < hits.Length; i++)
        {
            int docId = hits[i].Doc;
            float score = hits[i].Score;
            Document docy = srch.Doc(docId);
            Console.WriteLine(docy.Get("testRoot"));
        }

Thanks you for helping :)

Try either:

var parser = new QueryParser(Version.LUCENE_29, "testRoot", std);

Or:

var qry = parser.Parse("testRoot:new*");

You need to specify the correct field to search in. It appears the testRoot is the field name you are looking for. The second argument to QueryParser specifies the default field to search. In the first example shown you provided, you call it "new", which doesn't appear to be the name of a field being added to your document (essentially, in that case, your query looks like: new:new ). This default field will be used for searching unless you specify the field to search in your query, such as myField:findThis (see the query parser syntax ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM