[英]How to pass a column from a data frame into wordnet.synsets() in NLTK python
I have a dataframe in which one the columns contains english words. 我有一个数据框,其中一列包含英语单词。 I want to pass each of the elements in that columns through NLTKs synsets() function. 我想通过NLTKs synsets()函数传递该列中的每个元素。 My issue is that synsets() only takes in the a single word at a time. 我的问题是synsets()一次只能输入一个单词。
eg wordnet.synsets('father') 例如wordnet.synsets('father')
Now if I have dataframe like: 现在,如果我有像这样的数据框:
dc = {'A':[0,9,4,5],'B':['father','mother','kid','sister']}
df = pd.DataFrame(dc)
df
A B
0 0 father
1 9 mother
2 4 kid
3 5 sister
I want to pass column B though synsets() function and have another column that contains its output. 我想通过synsets()函数传递B列,并让另一列包含其输出。 I want to do this without iterating through the dataframe. 我想做到这一点而无需遍历数据框。
How do I do that? 我怎么做?
You could use the apply
method: 您可以使用apply
方法:
In [4]: df['C'] = df['B'].apply(wordnet.synsets)
In [5]: df
Out[5]:
A B C
0 0 father [Synset('father.n.01'), Synset('forefather.n.0...
1 9 mother [Synset('mother.n.01'), Synset('mother.n.02'),...
2 4 kid [Synset('child.n.01'), Synset('kid.n.02'), Syn...
3 5 sister [Synset('sister.n.01'), Synset('sister.n.02'),...
However, having a column of lists is usually not a very useful data structure. 但是,具有一列列表通常不是非常有用的数据结构。 It might be better to put each synonym in its own column. 将每个同义词放在自己的列中可能会更好。 You can do that by making the callback function return a pd.Series
: 您可以通过使回调函数返回pd.Series
:
In [29]: df.join(df['B'].apply(lambda word: pd.Series([w.name for w in wordnet.synsets(word)])))
Out[29]:
A B 0 1 2 3 \
0 0 father father.n.01 forefather.n.01 father.n.03 church_father.n.01
1 9 mother mother.n.01 mother.n.02 mother.n.03 mother.n.04
2 4 kid child.n.01 kid.n.02 kyd.n.01 child.n.02
3 5 sister sister.n.01 sister.n.02 sister.n.03 baby.n.05
4 5 6 7 8
0 father.n.05 father.n.06 founder.n.02 don.n.03 beget.v.01
1 mother.n.05 mother.v.01 beget.v.01 NaN NaN
2 kid.n.05 pull_the_leg_of.v.01 kid.v.02 NaN NaN
3 NaN NaN NaN NaN NaN
(I've chosen to display just the name
attribute of each Synset
; you could of course use (我选择只显示每个Synset
的name
属性;您当然可以使用
df.join(df['B'].apply(lambda word: pd.Series(wordnet.synsets(word))))
if you want the Synset
objects themselves.) 如果您想要Synset
对象本身。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.