简体   繁体   中英

C# comparing similar strings

I have a generic with some filenames (LIST1) and another biggeneric with a full list of names (LIST2). I need to match names from LIST1 to similar ones in LIST2. For example

LIST1
- **MAIZE_SLIP_QUANTITY_3_9.1.aif**

LIST 2
1- TUTORIAL_FAILURE_CLINCH_4.1.aif
2- **MAIZE_SLIP_QUANTITY_3_5.1.aif**
3- **MAIZE_SLIP_QUANTITY_3_9.2.aif**
4- TUTORIAL_FAILURE_CLINCH_5.1.aif
5- TUTORIAL_FAILURE_CLINCH_6.1.aif
6- TUTORIAL_FAILURE_CLINCH_7.1.aif
7- TUTORIAL_FAILURE_CLINCH_8.1.aif
8- TUTORIAL_FAILURE_CLINCH_9.1.aif
9- TUTORIAL_FAILURE_PUSH_4.1.aif

I've read about Levenshtein distance and used an implementation of it in a Framework ( SignumFramework Utilities ). It returns me distance=1 in lines 2 and 3. But in my case line 3 is a better match than line 2.

Is there another method better to compare similar strings? Something more flexible?

When comparing as strings, "9.2" is not a better match than "5.1" for "9.1". If you want the version numbers to be evaluated numerically, you have to parse the strings so that you can compare the string parts and the numerical parts separately.

有一个simlar问题在这里 ,也许有些答案会出现有关?

Your similarity criteria could be a combination of several other criteria. One could be the Levenshtein distance, others might eg be the longest common substring or prefix/suffix.

The longest common substring problem is actually a special case of edit distance, when substitutions are forbidden and only exact character match, insert, and delete are allowable edit operations (see here ).

Further metrics for string similarity are described here .

A regular expression could be used to get the items that match the name. The version number could be collected in a regex group in the match and parsed into a .net object (eg decimal) that you could use to compare which one was closest.

There's a fairly exhaustive set of answers to this SO question . At the bottom is link I put up to C# implementations for soundex, double metaphone, PHP similarity and levenstein.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM