Python：如何将令牌列表添加到数据框的新列

Question

我有一个超过50行的大型数据框。 对于每一行，我都有一列“令牌”，其中包含大量文本令牌。 我使用了for循环以及频率分布来查找“令牌”列中每一行的前10个令牌。

我正在尝试向数据框添加一个名为“ top10”的新列，以便对于每一行，“ top10”列中包含前10个标记。

这是我用来查找每行的前10个令牌的当前代码。

for i in range(len(df)):
   tokens = df.iloc[i]['tokens']
   frequency = nltk.FreqDist(tokens)
   print(" ", word_frequency.most_common(10))

我的数据框示例：

id location about age tokens
1    usa     ...  20   ['jim','hi','hello'......]
...
... 
40    uk     ...  50   ['bobby','hi','hey'......]

预期产量：

id location about age tokens                           top10
1    usa     ...  20   ['jim','hi','hello'......]   ['hi', 'paddy'....]
...
... 
40    uk     ...  50   ['bobby','hi','hey'......]   ['john', 'python'..]

top10列应按降序显示单词。

任何帮助表示赞赏，谢谢！

Answer 1

这是向DF添加新列的简单方法：

df['top10'] = word_frequency.most_common(10)

Answer 2

大熊猫apply与关键字参数reduce （不扩展列表）和axis=1 （过行，不列作为默认值），是更好的，因为你已经在行迭代。 熊猫会将您的列表解释为系列，而不适合单个单元格。

import pandas as pd
import nltk

df =  pd.DataFrame({x :{'tokens': ['hello', 'python', 'is', 'is', 'is', 'dog', 'god', 'cat', 'act', 'fraud', 'hola', 'the', 'a', 'the', 'on', 'no', 'of', 'foo', 'foo']} for x in range(0,10)} ).T


def most_common_words_list (x):
    word_count_tups = nltk.FreqDist(x['tokens']).most_common(2)
    return [word for word, count in word_count_tups]

df ['top2'] = df.apply(most_common_words_list,  result_type='reduce', axis=1)

Python：如何将令牌列表添加到数据框的新列

问题描述

2 个解决方案

解决方案1
0 2019-02-12 11:47:21

解决方案2
0 2019-02-12 11:56:16

Python：如何将令牌列表添加到数据框的新列

问题描述

2 个解决方案

解决方案1 0 2019-02-12 11:47:21

解决方案2 0 2019-02-12 11:56:16

解决方案1
0 2019-02-12 11:47:21

解决方案2
0 2019-02-12 11:56:16