简体   繁体   中英

Stripping HTML in SOLR for storage, not indexing

I have managed to strip HTML from content when indexing data in SOLR.

But is it possible to strip HTML from data when simply storing data?

This is my field:

<field name="Content" type="textNoHTML" indexed="true" stored="true"/>

And, the field type "textNoHTML" implements the solr.HTMLStripCharFilterFactory:

<charFilter class="solr.HTMLStripCharFilterFactory" />

As I said, this works fine for indexing, but is it possible to apply a similar filter for storing?

cheers!

If you're using the DataImportHandler you can use the HTMLStripTransformer .

Otherwise, you'll have to implement this client-side on your own. If your client is .NET you could use HtmlAgilityPack .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM