簡體   English   中英

如何使用lucene API搜索帶有精確短語的內容?

[英]how to search content with exact phrase using lucene API?

輸入短語進行搜索:阿迪爾·沙希王朝

  1. 阿迪爾·沙希王朝
  2. Qutb Shahi王朝
  3. Gohar Shahi模板

當我進入Adil Shahi朝代時,它返回了許多內容,我使用的是lucene API,並希望將內容與確切的詞組代碼匹配:用於創建索引

public static void main(String[] args) throws Exception{
     StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_47);
     PhraseQuery query = new PhraseQuery();
    Directory index = FSDirectory.open(new File("/ttlfiles/indexes/category_labels_en"));
    BufferedReader br = new BufferedReader(
            new InputStreamReader(System.in));
    String querystr = br.readLine();
    while(!querystr.equals("q")){
    Query q = new QueryParser(Version.LUCENE_47, "spa", analyzer).parse(querystr);

    // 3. search
    int hitsPerPage = 10;
    IndexReader reader = DirectoryReader.open(index);
    IndexSearcher searcher = new IndexSearcher(reader);
    TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
    searcher.search(q, collector);
    ScoreDoc[] hits = collector.topDocs().scoreDocs;

    // 4. display results
    System.out.println("Found " + hits.length + " hits.");
    for(int i=0;i<hits.length;++i) {
      int docId = hits[i].doc;
      Document d = searcher.doc(docId);
      System.out.println((i + 1) + ". " + d.get("spa"));
    }//end of for loop
    querystr = br.readLine();
    }//while's end
}

@Gimby:可能是用戶選擇了錯誤的代碼來通過Lucene搜索內容。 您必須先創建Lucene索引,然后才能搜索內容。

您可以參考以下代碼來搜索內容:

public static void main(String[] args) throws Exception{
     StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_47);
     //PhraseQuery query = new PhraseQuery();
    Directory index = FSDirectory.open(new File("/media/New Volume/ttlindexes"));
    BufferedReader br = new BufferedReader(
            new InputStreamReader(System.in));
    String querystr = br.readLine();
    while(!querystr.equals("q")){
        QueryParser parser = new QueryParser(Version.LUCENE_47,"spo",analyzer);
        parser.setDefaultOperator(QueryParser.Operator.OR);
        //parser.setPhraseSlop(0);
        Query query=parser.createPhraseQuery("spo",querystr);
    //Query q = new QueryParser(Version.LUCENE_47, "spa", analyzer).parse(querystr);

    // 3. search
    int hitsPerPage = 1000000;
    IndexReader reader = DirectoryReader.open(index);
    IndexSearcher searcher = new IndexSearcher(reader);
    TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
    searcher.search(query, collector);
    ScoreDoc[] hits = collector.topDocs().scoreDocs;

    // 4. display results
    System.out.println("Found " + hits.length + " hits.");
    for(int i=0;i<hits.length;++i) {
      int docId = hits[i].doc;
      Document d = searcher.doc(docId);
      System.out.println((i + 1) + ". " + d.get("spo"));
    }//end of for loop
    querystr = br.readLine();
    }//while's end
}

@Aadil:感謝您的指導,在對dbpedia的ttl文件建立索引后,我已經使用了它。 您可以從此鏈接http://wiki.dbpedia.org/Downloads39下載烏龜文件,並可以獲取。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM