在另一列中基于类别变量添加公用键的字典值

Question

I am trying to add multiple dictionaries (sum of common keys), based on categorical variable in another column. 我正在尝试根据另一列中的分类变量添加多个词典（通用键的总和）。 I tried using the groupby (and agg), groupby (and sum), and Counter(). 我尝试使用groupby（和agg），groupby（和sum）和Counter（）。 I have other continous columns too, but I do not want to add them up. 我也有其他连续列，但我不想将它们加起来。 I keep getting errors or undesired output. 我不断收到错误或不期望的输出。

import pandas as pd       
import numpy as np
from collections import Counter

# input
df1 = pd.DataFrame([
['Cat1', {'Word1': 8, 'Word2': 7, 'Word3': 6, 'Word4':1}], 
['Cat2', {'Word2': 7, 'Word4': 7, 'Word3': 6}], 
['Cat2', {'Word3':3, 'Word5': 2}],
['Cat1', {'Word1': 10, 'Word3': 5, 'Word4':1}]], columns=list('AB'))



# desired output
df_out = pd.DataFrame([
['Cat1', {'Word1': 18, 'Word2': 7, 'Word3': 11, 'Word4':2}],
['Cat2', {'Word2': 7, 'Word3': 9, 'Word4': 7, 'Word5': 2}]], columns=list('AB'))
df_out

# Trial 1 - groupby
for i in range(len(df1)):
    df1.groupby('A')['B'].agg({df1['B'][i])

# Trial 2 - Counter
counter = Counter()
for d in range(len(df['B']):
    counter.update(d)

Any help is appreciated. 任何帮助表示赞赏。 TIA TIA

Answer 1

Here's a solution which produces a regular DataFrame instead of a Series of dicts: 这是一个生成常规DataFrame而不是一系列字典的解决方案：

pd.DataFrame.from_records(df1.B).groupby(df1.A).sum()

The first step converts your Series of dicts into a regular DataFrame with one column per key. 第一步，将您的系列字典转换为每个键只有一列的常规DataFrame。 Then it's a simple groupby and sum to get the final result: 然后是一个简单的groupby和sum以获得最终结果：

      Word1  Word2  Word3  Word4  Word5
A                                      
Cat1   18.0    7.0     11    2.0    0.0
Cat2    0.0    7.0      9    7.0    2.0

Keeping your data in such a format will be much more efficient than a Series of dicts, unless the values are very sparse (ie the matrix is large and mostly zeros). 除非值非常稀疏（即矩阵很大，并且大多数为零），否则将数据保持为这种格式将比一系列dict更为有效。

If you do need the result to be a Series of dicts, this works: 如果您确实需要结果是一系列字典，则可以这样做：

def add_dicts(s):
    c = Counter()
    s.apply(c.update)
    return dict(c)

df1.groupby('A').B.agg(add_dicts)

It produces exactly your df_out . 它精确地产生您的df_out 。

在另一列中基于类别变量添加公用键的字典值

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-07-14 01:38:04

在另一列中基于类别变量添加公用键的字典值

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-07-14 01:38:04

解决方案1
0 已采纳 2018-07-14 01:38:04