![](/img/trans.png)
[英]What would cause WordNetCorpusReader to have no attribute LazyCorpusLoader?
[英]WordNetLemmatizer on dask.dataframe errors with 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'
我正在尝试对 dask 数据框进行词干提取
wnl = WordNetLemmatizer()
def lemmatizing(sentence):
stemSentence = ""
for word in sentence.split():
stem = wnl.lemmatize(word)
stemSentence += stem
stemSentence += " "
stemSentence = stemSentence.strip()
return stemSentence
df['news_content'] = df['news_content'].apply(stemming).compute()
但我收到以下错误:
AttributeError: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'
谢谢您的帮助。
这是因为wordnet
模块被“懒惰地读取”并且尚未评估。
使其工作的一个技巧是在 Dask 数据帧中使用WordNetLemmatizer()
之前首先使用它一次,例如
>>> from nltk.stem import WordNetLemmatizer
>>> import dask.dataframe as dd
>>> df = dd.read_csv('something.csv')
>>> df.head()
text label
0 this is a sentence 1
1 that is a foo bar thing 0
>>> wnl = WordNetLemmatizer()
>>> wnl.lemmatize('cats') # Use it once first, to "unlazify" wordnet.
'cat'
# Now you can use it with Dask dataframe's .apply() function.
>>> lemmatize_text = lambda sent: [wnl.lemmatize(word) for word in sent.split()]
>>> df['lemmas'] = df['text'].apply(lemmatize_text)
>>> df.head()
text label lemmas
0 this is a sentence 1 [this, is, a, sentence]
1 that is a foo bar thing 0 [that, is, a, foo, bar, thing]
或者,您可以尝试pywsd
:
pip install -U pywsd
然后在代码中:
>>> from pywsd.utils import lemmatize_sentence
Warming up PyWSD (takes ~10 secs)... took 9.131901025772095 secs.
>>> import dask.dataframe as dd
>>> df = dd.read_csv('something.csv')
>>> df.head()
text label
0 this is a sentence 1
1 that is a foo bar thing 0
>>> df['lemmas'] = df['text'].apply(lemmatize_sentence)
>>> df.head()
text label lemmas
0 this is a sentence 1 [this, be, a, sentence]
1 that is a foo bar thing 0 [that, be, a, foo, bar, thing]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.