简体   繁体   中英

SOLR related phrase search

I'm indexing user comments related with an entity by entity id. Example of comments schema:

<fields>
   <field name="entity_id" type="sint" indexed="true" stored="true" default=0/>
   <field name="comment_id" type="sint" indexed="true" stored="true" default=0/>
   <field name="comment_text" type="text" indexed="true" stored="true" default=""/>
</fields>

Now, I want to be able to query all comments for specific entity, and get the phrases which repeated several times in the set of comments.

Example of comments:

  • This is great place
  • You should really visit XYZ. Great place to bee .
  • If you want to spend awesome moments, this is the place to bee .
  • Great people and great place .

As you can see in example above, Great place is repeated several times, and also, place to bee . I need this phrases returned from SOLR, and I've tried with SOLR Facets, but I managed to get only words, not phrases ( Building a tag cloud with solr ).

Query I was trying with was kind of this:

http://localhost:8984/solr/select/?qt=tvrh&q=entity_id:12345&start=0&rows=0&facet=true&facet.field=comment_text&facet.minCount=1&facet.limit=50

Results were...

<lst name="facet_counts">
  <lst name="facet_queries"/>
  <lst name="facet_fields">
  <lst name="comment_text">
    <int name="epic">22</int>
    <int name="bar">18</int>
    <int name="you">16</int>
    <int name="quiver">15</int>
    <int name="happi">14</int>
    <int name="your">14</int>
    <int name="hour">13</int>
    <int name="drink">12</int>
    <int name="come">11</int>
    <int name="get">11</int>
    <int name="free">9</int> ...

Note: these results are not related with example comments posted earlier :).

Thanks.

Have you looked at using the ShingleFilterFactory ? With this filter, you can combine tokens into phrases for indexing. You could create a field that is just a copy of comment_text, use this filter on the field, and then get facets from that field

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM