[英]Determining if two words are derived from the same root in Python
I'd like to write a function same_base(word1, word2)
that returns True
when word1
and word2
are two English words derived from the same root word.我想编写一个函数
same_base(word1, word2)
,当word1
和word2
是源自同一个词根的两个英文单词时,它返回True
。 I realize that words can have multiple senses;我意识到单词可以有多种含义; I want the algorithm to be overzealous, returning
True
whenever it is possible to view the words as originating from the same place.我希望算法过于热心,只要有可能将单词视为来自同一个地方,就返回
True
。 Some false positives are OK;一些误报是可以的; false negatives are not.
假阴性不是。
Typically, stemming and lemmatization would be used for this.通常,词干提取和词形还原将用于此目的。 Here's what I've tried:
这是我尝试过的:
sung
and sing
, dig
and dug
, medication
and medicine
.sung
又sing
, dig
又dug
, medication
medicine
。 Does such a tool exist?有这样的工具吗? Do I just need an extremely aggressive stemmer / lemmatizer combo — and if so, where would I find one?
我是否只需要一个非常激进的词干提取器/词形还原器组合——如果是这样,我在哪里可以找到一个?
The general task, as you've described it, is not possible from simple textual analysis of the input characters.正如您所描述的,一般任务不可能通过对输入字符的简单文本分析来实现。 English does not have consistent rules for handling words as they evolve.
随着单词的发展,英语没有一致的处理单词的规则。 Yes, an excellent lemmatiser will solve the straightforward cases for you, those that can be discerned by applying transformations common within that POS (such as irregular verbs).
是的,优秀的词形还原师会为您解决一些简单的案例,这些案例可以通过应用该 POS 中常见的转换(例如不规则动词)来辨别。
However, to eliminate false negatives, you must have complete coverage of the word's basis;但是,要消除漏报,您必须完全覆盖单词的基础; complete will require etymology, especially in cases where the root word isn't in the English language, or perhaps doesn't appear in the shortened word itself.
complete将需要词源学,特别是在词根不在英语中,或者可能不出现在缩短词本身中的情况下。
For instance, what software tool could tell you that dis
and speculum
have the same root ( specere
), but that species
does not?例如,什么软件工具可以告诉您
dis
和speculum
具有相同的根( specere
),但该species
却没有? How would you tell that gentle
, gentile
, genteel
, and jaunty
have the same root?你怎么知道
gentle
、 gentile
、 genteel
和jaunty
有同一个词根? You'll need the etymology to get 100% of the actual connections.您将需要词源来获得 100% 的实际联系。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.