I'm indexing user comments related with an entity by entity id. Example of comments schema:
<fields>
<field name="entity_id" type="sint" indexed="true" stored="true" default=0/>
<field name="comment_id" type="sint" indexed="true" stored="true" default=0/>
<field name="comment_text" type="text" indexed="true" stored="true" default=""/>
</fields>
Now, I want to be able to query all comments for specific entity, and get the phrases which repeated several times in the set of comments.
Example of comments:
As you can see in example above, Great place is repeated several times, and also, place to bee . I need this phrases returned from SOLR, and I've tried with SOLR Facets, but I managed to get only words, not phrases ( Building a tag cloud with solr ).
Query I was trying with was kind of this:
http://localhost:8984/solr/select/?qt=tvrh&q=entity_id:12345&start=0&rows=0&facet=true&facet.field=comment_text&facet.minCount=1&facet.limit=50
Results were...
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="comment_text">
<int name="epic">22</int>
<int name="bar">18</int>
<int name="you">16</int>
<int name="quiver">15</int>
<int name="happi">14</int>
<int name="your">14</int>
<int name="hour">13</int>
<int name="drink">12</int>
<int name="come">11</int>
<int name="get">11</int>
<int name="free">9</int> ...
Note: these results are not related with example comments posted earlier :).
Thanks.
Have you looked at using the ShingleFilterFactory ? With this filter, you can combine tokens into phrases for indexing. You could create a field that is just a copy of comment_text, use this filter on the field, and then get facets from that field
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.