简体   繁体   中英

Parsing Lucene Query syntax and escaping for CloudSearch

Basically, I have an application that needs to support both Lucene.NET and Amazon CloudSearch.

So, I can't re-write the queries, I need to use the standard queries from lucene, and use the .ToString() on the query to get the syntax.

The issue is that in Lucene.NET (I don't know if this is the same in the java version), the .ToString() method return the raw string without the escape characters.

Therefore, things like:

(title:blah:blah summary:"lala:la")

should be

(title:blah\:blah summary:"lala\:la")

What I need is a regex that will add the escapes.

Is this possible? and if so, what would it look like.

Some additional possible variances:

(title:"this is a search:term")
(field5:"this is a title:term")

Based on comments and edits, it seems that you want any query string to be able to be correctly escaped by the regex, and any given lucene query to be accurately represented by the resulting string.

That ain't gonna happen.

Lucene query syntax is not capable of expressing all lucene queries. In fact, the string you get from Query.toString() often can't even be parsed by the QueryParser , nevermind being an accurate reconstruction of the query.

The long and short of it: You are going about this the wrong way. Query.ToString() is not designed to serialize the query, and it's goal is not to create a parsable string query. It's mainly for debugging and such. If you keep attempting to use it this way, this tomfoolery of trying to use a regex to escape ambiguous query syntax will likely just be the start of your troubles.

This question provides another example of this .

You can use this regex to escape the colon : at strategic points of the string

(?<!title|summary):

Then escape the captured colon :

Explanation

Look behind ?<! for any colon that is not followed by title or summary , then match the colon :

See Demo

input

(title:blah:blah summary:"lala:la")

Output

(title:blah\:blah summary:"lala\:la")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM