简体   繁体   中英

Multi-term phrases in Lucene

I am reading the Lucene in Action book and I do not understand the multi-term phrases part.

The following text is indexed:

the quick brown fox jumped over the lazy dog

And then you add the following terms to the PhraseQuery : quick jumped lazy with a slop equal 4. That results in a match, but I don't understand how that happens. How do you calculate the number of moves when there are multiple terms? I don't understand how they do it.

The same with the terms lazy jumped quick with slop equal 8.

The slop is actually an edit distance . Inserting extra terms in between them adds 1 to the distance, transposing terms adds 2 (the first edit moving the two terms atop one another).

You can go through the edits one at a time to illustrate:

  • quick jumped lazy distance:0
  • quick _ jumped lazy distance:1
  • quick _ _ jumped lazy distance:2
  • quick _ _ jumped _ lazy distance:3
  • quick _ _ jumped _ _ lazy distance:4

And for the second case:

  • lazy jumped quick distance:0
  • lazy/jumped quick distance:1
  • lazy/jumped/quick distance:2 (all three terms superimposed, in the same position)
  • quick lazy/jumped distance:3
  • quick jumped lazy distance:4
  • quick _ jumped lazy distance:5
  • quick _ _ jumped lazy distance:6
  • quick _ _ jumped _ lazy distance:7
  • quick _ _ jumped _ _ lazy distance:8

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM