简体繁体中英

Check how much a String sounds like another one in Java

原文 2010-03-17 09:38:15 2 9 java/ string

I'd like to know if there is any class in Java able to check, using its own criteria, how much a String is equal to another one. Example :

William Shakespeare / William Shakespeare : might be 100%
William Shakespe**a**re / William Shakespe**e**re : might have above 90%
William Shakespeare / Shakespeare, William : might have above 70% (just examples)

9 answers

I see two main candidates:

The Soundex encoding, implemented by Apache Commons . However, note that it's mainly meant for single, relatively short words. It won't find a similarity in your third example. Additionally, it really only works for English words.
The Levenshtein distance (Again implemented at Apache Commons ). This is language agnostic, but similarity for switched parts as in your third example will be relatively low (more like 40%). Modifications like the Damerau–Levenshtein distance may yield better results.

You have to use a "soft" string metric:

SoundEx
Metaphone
Hamming distance
Levenshtein distance
...

There are many others, see String Metrics for an overview.

The best algorith highly depends on the problem field. For example, SoundEx degrades for Eastern European names and the Hamming distance does not help you much if you want to compare the similiarity of "real world" words.

Generally, there is the levenshtein algorithm, which just outputs how many insert/update/delete operations you would have to perform (characterwise) in order to transform one string into another. Apache's StringUtils class has an implementation.

您可以使用： Class Soundex

This is called SoundEx, lookup java soundex for several implementations.

one of them is apache soundex which looks good (although I haven't used it myself).

听起来像SoundEx ， Apache Commons中提供了一个实现。

您可以尝试SoundEx算法。

String matching is very problem-specific, because most of the time you will have the same characteristics of noise in your strings to be matched, be it extra punctuation, typos or spelling errors. You will need to find an algorithm that is appropriate for the problems in your input data if you are doing this on a wide scale.

Soundex will give you a degree of confidence that two strings sound the same, but you may have to do some upfront cleaning first (like removing punctuation and tokenizing the string into separate words).

The best thing you can do is to run a test, there are an enormous amount of different algorithms you can use, levenshtein being a great one, as is soundex (although your mileage will vary with your problem area). There are also variations on those two algorithms, BTW.

I suggest having a look at the simmetrics and second string libraries which have loads of string matching implementations (of the two I prefer the second string library).

It sounds like you have an interesting problem to solve, good luck!

try SimMetrics - open source library including SoundEx and ChapmanMatchingSoundex which would give a far better score for the examples given. ie Will Shake vs Shake, Will this approach uses a matching approach on-top of SoundEx. Another metric you may want to try which although not phonetic scores very well regardless (if not better in differing name matching tasks) is the q-Grams metric in the same library.

Advice on String Similarity Metrics (Java). Distance, sounds like or combo?

how to check that string starts with another string in java

how to check if one date is after another in java?

How to break a String into a string array, and check if the string is one thing or another

java - Get 4 strings from another string and check which one is different

How to check how much char space in 1 string

Check how much memory bufferedImage in java uses?

How to check to see if the characters of one string are in another string

how to compare first character of one string to 5th char of another string in Java? and check if they are same.I am getting error as below

How to check in Java if a string isn't drawn over another string?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Advice on String Similarity Metrics (Java). Distance, sounds like or combo? how to check that string starts with another string in java how to check if one date is after another in java? How to break a String into a string array, and check if the string is one thing or another java - Get 4 strings from another string and check which one is different How to check how much char space in 1 string Check how much memory bufferedImage in java uses? How to check to see if the characters of one string are in another string how to compare first character of one string to 5th char of another string in Java? and check if they are same.I am getting error as below How to check in Java if a string isn't drawn over another string?

Related Tags

Check how much a String sounds like another one in Java

Question

9 answers

solution1
14 2010-03-17 09:46:21

solution2
7 2010-03-17 09:42:01

solution3
6 ACCPTED 2010-03-17 09:40:29

solution4
2 2010-03-17 09:41:51

solution5
2 2010-03-17 09:42:17

solution6
2 2010-03-17 09:42:18

solution7
2 2010-03-17 09:42:28

solution8
0 2010-03-17 10:33:00

solution9
0 2010-03-18 12:22:50

Check how much a String sounds like another one in Java

Question

9 answers

solution1 14 2010-03-17 09:46:21

solution2 7 2010-03-17 09:42:01

solution3 6 ACCPTED 2010-03-17 09:40:29

solution4 2 2010-03-17 09:41:51

solution5 2 2010-03-17 09:42:17

solution6 2 2010-03-17 09:42:18

solution7 2 2010-03-17 09:42:28

solution8 0 2010-03-17 10:33:00

solution9 0 2010-03-18 12:22:50

solution1
14 2010-03-17 09:46:21

solution2
7 2010-03-17 09:42:01

solution3
6 ACCPTED 2010-03-17 09:40:29

solution4
2 2010-03-17 09:41:51

solution5
2 2010-03-17 09:42:17

solution6
2 2010-03-17 09:42:18

solution7
2 2010-03-17 09:42:28

solution8
0 2010-03-17 10:33:00

solution9
0 2010-03-18 12:22:50