简体   繁体   中英

Lucene query in C# not finding results with punctuation

I have a search bar that executes a lucene query on the "description" field, but it doesn't return results when with apostrophes. For example, I have a product where the description is Herter's® EZ-Load 200lb Feeder - 99018 . When I search for "Herter", I get results, but I get no results if I search for "Herter's" or "Herters". This is my search code:

var query = Request.QueryString["q"];
var search = HttpContext.Current.Server.UrlDecode(query);

var rewardProductLookup = new RewardCatalogDataHelper();
RewardProductSearchCriteria criteria = new RewardProductSearchCriteria()
{
    keywords = search,
    pageSize = 1000,
    sortDirection = "desc"
};

IEnumerable<SkinnyItem> foundProducts = rewardProductLookup.FindByKeywordQuery(criteria);

public IEnumerable<SkinnyItem> FindByKeywordQuery(RewardProductSearchCriteria query)
{
    var luceneIndexDataContext = new LuceneDataContext("rewardproducts", _dbName);
    string fieldToQuery = "rpdescription";
    bool sortDirection = query.sortDirection.ToLower().Equals("desc");

    MultiPhraseQuery multiPhraseQuery = new MultiPhraseQuery();
    var keywords = query.keywords.ToLower().Split(',');
    foreach (var keyword in keywords)
    {
        if (!String.IsNullOrEmpty(keyword))
        {
            var term = new Term(fieldToQuery, keyword);
            multiPhraseQuery.Add(term);
        }
    }

    var booleanQuery = new BooleanQuery();
    booleanQuery.Add(multiPhraseQuery, BooleanClause.Occur.MUST);

    return
        luceneIndexDataContext.BooleanQuerySearch(booleanQuery, fieldToQuery, sortDirection)
            .Where(i => i.Fields["eligibleforpurchase"] == "1");
}

The problem here is analysis. You haven't specified the analyzer being used in this case, so I'll assume it's StandardAnalyzer .

When analyzed, the term "Herter's" will be translated to "herter". However, no analyzer is being applied in your FindByKeywordQuery method, so looking for "herter" works, but "herter's" doesn't.

One solution would be to use the QueryParser , in stead of manually constructing a MultiPhraseQuery . The QueryParser will handle tokenizing, lowercasing, and such. Something like:

QueryParser parser = new QueryParser(VERSION, "text", new StandardAnalyzer(VERSION));
Query query = parser.Parse("\"" + query.keywords + "\"");

The single quote is the delimiter for text fields in a query.

Select * FROM Product where Description = 'foo' 

You will need to escape or double any single quote your query. try this in the loop.

foreach (var keyword in keywords)
{
    if (!String.IsNullOrEmpty(keyword))
    {
        var term = new Term(fieldToQuery, keyword);
        term = term.Replace("'", "''");
        multiPhraseQuery.Add(term);
    }
}

You could also create an extension method

    [DebuggerStepThrough]
    public static string SanitizeSQL(this string value)
    {
        return value.Replace("'", "''").Replace("\\", "\\\\");
    }

in which case you could then you could do this in the loop

foreach (var keyword in keywords)
{
    if (!String.IsNullOrEmpty(keyword))
    {
        var term = new Term(fieldToQuery, keyword.SanitizeSQL());
        multiPhraseQuery.Add(term);
    }
}

Hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM