简体   繁体   English

如何获取 python 中同义词/复数词的词基?

[英]How do I get the base of a synonym/plural of a word in python?

I would like to use python to convert all synonyms and plural forms of words to the base version of the word.我想使用 python 将单词的所有同义词和复数 forms 转换为单词的基本版本。

eg Babies would become baby and so would infant and infants.例如,婴儿会变成婴儿,婴儿和婴儿也会变成婴儿。

I tried creating a naive version of plural to root code but it has the issue that it doesn't always function correctly and can't detect a large amount of cases.我尝试创建一个原始版本的复数到根代码,但它的问题是它并不总是正确地 function 并且无法检测到大量情况。

contents = ["buying", "stalls", "responsibilities"]
for token in contents:
    if token.endswith("ies"):
        token = token.replace('ies','y')
    elif token.endswith('s'):
        token = token[:-1]
    elif token.endswith("ed"):
        token = token[:-2]
    elif token.endswith("ing"):
        token = token[:-3]

print(contents)

I have not used this library before, so that this with a grain of salt. 我以前没有使用过这个库,所以这有点盐。 However, NodeBox Linguistics seems to be a reasonable set of scripts that will do exactly what you are looking for if you are on MacOS. 但是,NodeBox Linguistics似乎是一组合理的脚本,如果您使用的是MacOS,它们可以完全满足您的需求。 Check the link here: https://www.nodebox.net/code/index.php/Linguistics 在此处检查链接: https : //www.nodebox.net/code/index.php/Linguistics

Based on their documentation, it looks like you will be able to use lines like so: 根据他们的文档,看起来您将能够使用如下代码:

print( en.noun.singular("people") )
>>> person

print( en.verb.infinitive("swimming") )
>>> swim

etc.

In addition to the example above, another to consider is a natural language processing library like NLTK . 除了上面的示例外,还要考虑的另一个自然语言处理库是NLTK The reason why I recommend using an external library is because English has a lot of exceptions. 我之所以推荐使用外部库,是因为英语有很多例外。 As mentioned in my comment, consider words like: class, fling, red, geese, etc., which would trip up the rules that was mentioned in the original question. 正如我在评论中提到的那样,考虑一下诸如“ class”,“ fling”,“ red”,“ geese”等字样,它们会违反原始问题中提到的规则。

I build a python library - Plurals and Countable , which is open source on github. The main purpose is to get plurals (yes, mutliple plurals for some words), but it also solves this particular problem.我构建了一个 python 库 - Plurals and Countable ,它在 github 上是开源的。主要目的是获取复数(是的,某些单词的复数),但它也解决了这个特殊问题。

import plurals_counterable as pluc
pluc.pluc_lookup_plurals('men', strict_level='dictionary')

will return a dictionary of the following.将返回以下内容的字典。

{
    'query': 'men', 
    'base': 'man', 
    'plural': ['men'], 
    'countable': 'countable'
}

The base field is what you need.基域就是你所需要的。

The library actually looks up the words in dictionaries, so it takes some time to request, parse and return.图书馆实际上是在字典中查找单词,因此需要一些时间来请求、解析和返回。 Alternatively, you might use REST API provided by Dictionary.video .或者,您可以使用Dictionary.video提供的 REST API。 You'll need contact admin@dictionary.video to get an API key.您需要联系 admin@dictionary.video 以获得 API 密钥。 The call will be like电话会像

import requests
import json
import logging

url = 'https://dictionary.video/api/noun/plurals/men?key=YOUR_API_KEY'
response = requests.get(url)
if response.status_code == 200:
    return json.loads(response.text)['base']
else:
    logging.error(url + ' response: status_code[%d]' % response.status_code)
    return None

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM