简体   繁体   中英

How to escape “<” character in regex in Solr schema.xml?

Using Solr 3.5.0 and in my schema.xml I'm using the following to mark the end of sentences and replace the end punctuation with a symbolic token:

<charFilter class="solr.PatternReplaceCharFilterFactory" 
pattern="(?<=[^.!?\\s][^.!?]*(?:[.!?](?![']?\s|$)[^.!?]*)*)[.!?]+(?=\\s|$)"
replacement=" monkeysentence"/>

I'm not sure if that will even work for what I want, but first I need to solve the problem of escaping the '<' character in the first '?<=' lookbehind.

I get the following error:

org.xml.sax.SAXParseException: The value of attribute "pattern" 
associated with an element type "null" must not contain the '<' character.

I've tried using a '\\' as in:

 pattern="(?\<=[^.!?\\s][^.!?]*(?:[.!?](?![']?\s|$)[^.!?]*)*)[.!?]+(?=\\s|$)"

But I get the same error.

As this is in an XML file, you will need to use an XML escape to encode < , namely &lt; (you may also need to encode > as &gt; , " as &quot; , and & as &amp; )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM