简体   繁体   English

NLTK 和 Pandas - 将同义词集添加到列表中

[英]NLTK and Pandas - adding synsets into a list

I wanted to great a list that is added as new row to a dataframe.我想要一个作为新行添加到数据框中的列表。

import nltk
import pandas as pd
from nltk.corpus import wordnet
import pandas as pd
import numpy as np


Overviewdataframe = pd.DataFrame([]) 
synonyms = []

for syn in wordnet.synsets("active"):
    for l in syn.lemmas():
            synonyms.append(l.name())  
            Overviewdataframe = Overviewdataframe.append(synonyms)
            synonyms = []

Instead the row is added as column.而是将行添加为列。 Can you help me please!你能帮我吗!

Thank you.谢谢你。

TL;DR TL; 博士

from itertools import chain

import pandas as pd
from nltk.corpus import wordnet as wn

wordlist = ['active', 'fan', 'hop', 'grace']

words2lemmanames = [{'word': word, 'synset':ss.name(), 'lemma_names':ss.lemma_names()}
                    for word in wordlist for ss in wn.synsets(word)]
pd.DataFrame(words2lemmanames)

In Long在龙

When querying the WordNet interface in NLTK, querying a word returns a "concept" also known as "synset"在NLTK中查询WordNet接口时,查询一个词会返回一个“概念”也称为“synset”

>>> wn.synsets('active')

[Synset('active_agent.n.01'), Synset('active_voice.n.01'), Synset('active.n.03'), Synset('active.a.01'), Synset('active.s.02'), Synset('active.a.03'), Synset('active.s.04'), Synset('active.a.05'), Synset('active.a.06'), Synset('active.a.07'), Synset('active.s.08'), Synset('active.a.09'), Synset('active.a.10'), Synset('active.a.11'), Synset('active.a.12'), Synset('active.a.13'), Synset('active.a.14')]

Each synset has its own list of lemma names, ie每个同义词集都有自己的引理名称列表,即

>>> wn.synsets('active')[0].lemma_names()
['active_agent', 'active']

You can also access the synset directly with their "name", usual convention for the "name" is the (i) first lemma name then dot (ii) the POS tag and dot (ii) the index number.您还可以直接使用它们的“名称”访问同义词集,“名称”的通常约定是 (i) 第一个引理名称,然后是点 (ii) POS 标记和点 (ii) 索引号。

>>> wn.synsets('active')[0] == wn.synset('active_agent.n.01')
True

Finally, given a list of key-value pairs (ie dictionary object), you can feed it into a pandas.DataFrame to convert it into a dataframe.最后,给定一个键值对列表(即字典对象),您可以将其输入到pandas.DataFrame以将其转换为数据帧。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM