[英]How to check if 2 words have the same base or stem?
I'm trying to merge words that have the same base. 我正在尝试合并具有相同基础的单词。 Example:
例:
or 要么
At first I used the 起初我用
Word.Application().SynonymInfo[myWord, Word.WdLanguageID.wdEnglishUS];
to get synonyms of a word from word.dll
. 从
word.dll
获取单词的word.dll
。 But I realized that I don't wanna merge only synonyms but words with the same base. 但是我意识到我并不想只合并同义词,而是合并具有相同基数的单词。
Is there any function I could use from word.dll
or any dll
that would return if 2 words have the same base? 如果两个单词具有相同的基数,我是否可以从
word.dll
或任何会返回的dll
使用任何功能?
You are probably looking for Inflector which is an open source library. 您可能正在寻找Inflector ,这是一个开源库。
It is made .Net 3.5 compatible
与.Net 3.5兼容
Here is a sample code for it. 这是它的示例代码 。
English language has lot of exceptions but taking care of few most common scenarios using your own little function will take care of 90% cases. 英语有很多例外,但是使用您自己的小功能来处理一些最常见的情况将解决90%的情况。
It seems to be there are few common Scenarios: 似乎没有几种常见的方案:
a) Past tense : by adding suffix "ed" a)过去式:通过添加后缀“ ed”
b) Plurals : by adding "s", "es", b)复数:通过添加“ s”,“ es”,
c) Common Suffix for making adjective : c)形容词的通用后缀:
d) Common Suffix for Adverb d)副词的通用后缀
e) Common Suffix used for converting verb to noun e)用于将动词转换为名词的通用后缀
So, by removing common suffix from words, we can try to merge the words which results in same base. 因此,通过从单词中删除通用后缀,我们可以尝试合并产生相同基数的单词。
For not so common scenarios, may be, we can some string similar Algorithm to know if strings are similar are not. 对于不太常见的情况,可能是,我们可以使用一些字符串相似的算法来了解字符串是否相似。 like using Levenshtein distance implementation:
就像使用Levenshtein距离实现:
Please see the following stackoverflow question also : 请同时参阅以下stackoverflow问题:
Are there any Fuzzy Search or String Similarity Functions libraries written for C#? 是否有为C#编写的模糊搜索或字符串相似函数库?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.