简体   繁体   English

如何在 Lucene 搜索中匹配精确文本?

[英]How to match exact text in Lucene search?

Im trying to match a text Config migration from ASA5505 8.2 to ASA5516 in column TITLE .我试图在TITLE列中匹配从 ASA5505 8.2 到 ASA5516的文本配置迁移

My program looks like this.我的程序看起来像这样。

Directory directory = FSDirectory.open(indexDir);

MultiFieldQueryParser queryParser = new MultiFieldQueryParser(Version.LUCENE_35,new String[] {"TITLE"}, new StandardAnalyzer(Version.LUCENE_35));        
IndexReader reader = IndexReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);       
queryParser.setPhraseSlop(0);
queryParser.setLowercaseExpandedTerms(true);
Query query = queryParser.parse("TITLE:Config migration from ASA5505 8.2 to ASA5516");
System.out.println(queryStr);
TopDocs topDocs = searcher.search(query,100);
System.out.println(topDocs.totalHits);
ScoreDoc[] hits = topDocs.scoreDocs;
System.out.println(hits.length + " Record(s) Found");
for (int i = 0; i < hits.length; i++) {
    int docId = hits[i].doc;
    Document d = searcher.doc(docId);
    System.out.println("\"Title :\" " +d.get("TITLE") );
}

But its returning但它的回归

"Title :" Config migration from ASA5505 8.2 to ASA5516
"Title :" Firewall  migration from ASA5585 to  ASA5555
"Title :" Firewall  migration from ASA5585 to  ASA5555

Second 2 results are not expected.So what modification required to match exact text Config migration from ASA5505 8.2 to ASA5516后 2 个结果不是预期的。所以需要什么修改才能匹配从 ASA5505 8.2 到 ASA5516 的精确文本配置迁移

And my indexing function looks like this我的索引功能看起来像这样

public class Lucene {
public static final String INDEX_DIR = "./Lucene";
private static final String JDBC_DRIVER = "oracle.jdbc.OracleDriver";
private static final String CONNECTION_URL = "jdbc:oracle:thin:xxxxxxx"

private static final String USER_NAME = "localhost";
private static final String PASSWORD = "localhost";
private static final String QUERY = "select * from TITLE_TABLE";

public static void main(String[] args) throws Exception {
    File indexDir = new File(INDEX_DIR);
    Lucene indexer = new Lucene();
    try {
        Date start = new Date();
        Class.forName(JDBC_DRIVER).newInstance();
        Connection conn = DriverManager.getConnection(CONNECTION_URL, USER_NAME, PASSWORD);
        SimpleAnalyzer analyzer = new SimpleAnalyzer(Version.LUCENE_35);
        IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_35, analyzer);
        IndexWriter indexWriter = new IndexWriter(FSDirectory.open(indexDir), indexWriterConfig);
        System.out.println("Indexing to directory '" + indexDir + "'...");
        int indexedDocumentCount = indexer.indexDocs(indexWriter, conn);
        indexWriter.close();
        System.out.println(indexedDocumentCount + " records have been indexed successfully");
        System.out.println("Total Time:" + (new Date().getTime() - start.getTime()) / (1000));
    } catch (Exception e) {
        e.printStackTrace();
    }
}

int indexDocs(IndexWriter writer, Connection conn) throws Exception {
    String sql = QUERY;
    Statement stmt = conn.createStatement();
    stmt.setFetchSize(100000);
    ResultSet rs = stmt.executeQuery(sql);
    int i = 0;
    while (rs.next()) {
        System.out.println("Addind Doc No:" + i);
        Document d = new Document();
        System.out.println(rs.getString("TITLE"));
        d.add(new Field("TITLE", rs.getString("TITLE"), Field.Store.YES, Field.Index.ANALYZED));
        d.add(new Field("NAME", rs.getString("NAME"), Field.Store.YES, Field.Index.ANALYZED));
        writer.addDocument(d);
        i++;
    }
    return i;
}
}

Try PhraseQuery as follow:尝试PhraseQuery如下:

BooleanQuery mainQuery= new BooleanQuery(); 
String searchTerm="config migration from asa5505 8.2 to asa5516";
String strArray[]= searchTerm.split(" ");
for(int index=0;index<strArray.length;index++)
{
    PhraseQuery query1 = new PhraseQuery();
     query1.add(new Term("TITLE",strArray[index]));
     mainQuery.add(query1,BooleanClause.Occur.MUST);
}

And then execute the mainQuery .然后执行mainQuery

Check out this thread of stackoverflow, It may help you to use PhraseQuery for exact search.查看 stackoverflow 的这个线程,它可以帮助您使用PhraseQuery进行精确搜索。

PVR is correct, that using a phrase query is probably the right solution here, but they missed on how to use the PhraseQuery class. PVR 是正确的,在这里使用短语查询可能是正确的解决方案,但他们错过了如何使用PhraseQuery类。 You are already using QueryParser though, so just use the query parser syntax by enclosing you search text in quotes:不过,您已经在使用QueryParser ,因此只需通过将搜索文本括在引号中来使用查询解析器语法:

Query query = queryParser.parse("TITLE:\"Config migration from ASA5505 8.2 to ASA5516\"");

Based on your update, you are using a different analyzer at index-time and query-time.根据您的更新,您在索引时和查询时使用不同的分析器。 SimpleAnalyzer and StandardAnalyzer don't do the same things. SimpleAnalyzerStandardAnalyzer不做同样的事情。 Unless you have a very good reason to do otherwise, you should analyze the same way when indexing and querying.除非您有很好的理由不这样做,否则您应该在索引和查询时以相同的方式进行分析。

So, change the analyzer in your indexing code to StandardAnalyzer (or vice-versa, use SimpleAnalyzer when querying), and you should see better results.因此,将索引代码中的分析器更改为StandardAnalyzer (反之亦然,查询时使用SimpleAnalyzer ),您应该会看到更好的结果。

Here is what i have written for you which works perfectly:这是我为您写的,效果很好:

USE: queryParser.parse("\\"Config migration from ASA5505 8.2 to ASA5516\\""); USE: queryParser.parse("\\"Config migration from ASA5505 8.2 to ASA5516\\"");

  1. To create indexes创建索引

    public static void main(String[] args) { IndexWriter writer = getIndexWriter(); Document doc = new Document(); Document doc1 = new Document(); Document doc2 = new Document(); doc.add(new Field("TITLE", "Config migration from ASA5505 8.2 to ASA5516",Field.Store.YES,Field.Index.ANALYZED)); doc1.add(new Field("TITLE", "Firewall migration from ASA5585 to ASA5555",Field.Store.YES,Field.Index.ANALYZED)); doc2.add(new Field("TITLE", "Firewall migration from ASA5585 to ASA5555",Field.Store.YES,Field.Index.ANALYZED)); try { writer.addDocument(doc); writer.addDocument(doc1); writer.addDocument(doc2); writer.close(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } public static IndexWriter getIndexWriter() { IndexWriter indexWriter=null; try { File file=new File("D://index//"); if(!file.exists()) file.mkdir(); IndexWriterConfig conf=new IndexWriterConfig(Version.LUCENE_34, new StandardAnalyzer(Version.LUCENE_34)); Directory directory=FSDirectory.open(file); indexWriter=new IndexWriter(directory, conf); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } return indexWriter; }

    } }

2.To search string 2.搜索字符串

    public static void main(String[] args) 
    {

    IndexReader reader=getIndexReader();

    IndexSearcher searcher = new IndexSearcher(reader);

    QueryParser parser = new QueryParser(Version.LUCENE_34, "TITLE" ,new StandardAnalyzer(Version.LUCENE_34));

    Query query;
    try 
    {
    query = parser.parse("\"Config migration from ASA5505 8.2 to ASA5516\"");

    TopDocs hits = searcher.search(query,3);

    ScoreDoc[] document = hits.scoreDocs;
    int i=0;
    for(i=0;i<document.length;i++)
    {
        Document doc = searcher.doc(i);

        System.out.println("TITLE=" + doc.get("TITLE"));
    }
        searcher.close();

    } 
    catch (Exception e) 
    {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } 
            }

public static IndexReader getIndexReader()
{
    IndexReader reader=null;

    Directory dir;
    try 
    {
        dir = FSDirectory.open(new File("D://index//"));
        reader=IndexReader.open(dir);
    } catch (IOException e) 
    {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

    return reader;
}   

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM