Using Solr 3.5.0 and in my schema.xml I'm using the following to mark the end of sentences and replace the end punctuation with a symbolic token:
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="(?<=[^.!?\\s][^.!?]*(?:[.!?](?![']?\s|$)[^.!?]*)*)[.!?]+(?=\\s|$)"
replacement=" monkeysentence"/>
I'm not sure if that will even work for what I want, but first I need to solve the problem of escaping the '<' character in the first '?<=' lookbehind.
I get the following error:
org.xml.sax.SAXParseException: The value of attribute "pattern"
associated with an element type "null" must not contain the '<' character.
I've tried using a '\\' as in:
pattern="(?\<=[^.!?\\s][^.!?]*(?:[.!?](?![']?\s|$)[^.!?]*)*)[.!?]+(?=\\s|$)"
But I get the same error.
As this is in an XML file, you will need to use an XML escape to encode <
, namely <
(you may also need to encode >
as >
, "
as "
, and &
as &
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.