更新熊猫数据框并在数据存在时更新值

Question

我有一个像这样的 csv 文件：

word, tag, counter
I, Subject, 1
Love, Verb, 3
Love, Adjective, 1

我想创建一个数据框，其中列是单词和标签列表，如下所示：

Word Subject  Verb  Adjective
I     1        0     0
Love  0        3     1

我如何设法用熊猫做到这一点？

Answer 1

您可以使用pivot ：

df = df.pivot(index='word', columns='tag', values='counter').fillna(0).astype(int)
print (df)
tag   Adjective  Subject  Verb
word                          
I             0        1     0
Love          1        0     3

用另一种解决方案set_index和unstack ：

df = df.set_index(['word','tag'])['counter'].unstack(fill_value=0)
print (df)
tag   Adjective  Subject  Verb
word                          
I             0        1     0
Love          1        0     3

但如果得到：

ValueError：索引包含重复条目，无法重塑

然后需要在pivot_table通过一些aggfunc进行pivot_table ：

print (df)
   word        tag  counter
0     I    Subject        1
1  Love       Verb        3
2  Love  Adjective        1 <-duplicates for Love and Adjective
3  Love  Adjective        3 <-duplicates for Love and Adjective

df = df.pivot_table(index='word', 
                    columns='tag', 
                    values='counter', 
                    aggfunc='mean', 
                    fill_value=0)
print (df)
tag   Adjective  Subject  Verb
word                          
I             0        1     0
Love          2        0     3

groupby和unstack另一个解决方案：

df = df.groupby(['word','tag'])['counter'].mean().unstack(fill_value=0)
print (df)
tag   Adjective  Subject  Verb
word                          
I             0        1     0
Love          2        0     3

更新熊猫数据框并在数据存在时更新值

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-02-23 14:01:45

更新熊猫数据框并在数据存在时更新值

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-02-23 14:01:45

解决方案1
2 已采纳 2017-02-23 14:01:45