简体   繁体   中英

Calculating the similarity between 2 sentences

I would like to calculate the similarity between 2 sentences and I need the percentage value which says "how good" they match with each other. Sentences like,

1. The red fox is moving on the hill.
2. The black fox is moving in the bill.

I was considering about Levenshtein distance but I am not sure about this because it says it is for finding similarity between "2 words". So can this Levenshtein distance help me or what other method can help me? I will be using JavaScript.

尝试此解决方案JS string diff

Use Jaccard index . You can find implementations in any language, including JavaScript ( here is one, didn't test it personally though).

this is what i would do depending on how important this is. if this is medium to low priority here is a simple algo.

  1. scan all sentences and see how often a word occurs.
  2. filter out the most common words like the ones in 30% of sentences , ie don't count these. so at the as would hopefully not be counted.
  3. then do your bag of words comparison.

But the context in why you want to do this is really important. ie the example you gave us could be for students learning english etc. ie theres different algorithms i would use if i was trying to see if crowd sourced users are describing the same paragraph vs if article topics are similar enough for a suggested reading section.

A common Method to compute the similarity of two sentences is to cosine similiarity. Don't know if there an implemenatation in JavaScript exists. The cosine similiarity looks on words and not of single letters. The web is full of explenations for example here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM