Pandas DataFrame中的交叉表

Question

I created a DataFrame 我创建了一个DataFrame

    A1  A2  A3  A4
0   cccc    xx  6   5
1   aaaa    yy  8   0
2   aaaa    xx  15  0
3   bbbb    xx  21  4
4   bbbb    xx  26  0
5   cccc    yy  33  2
6   aaaa    xx  44  1
7   cccc    xx  48  2
8   aaaa    yy  58  0
9   cccc    yy  59  5
10  bbbb    yy  77  0
11  bbbb    yy  99  0

and now using crosstab() with the command given below I was created new DataFrame. 现在使用crosstab()和下面给出的命令创建了新的DataFrame。

df5 = pd.crosstab(df4['A1'], df4['A2'], margins=False,values=df4['A3'] , 
                 dropna=False, aggfunc='mean').reset_index().fillna(0)

this works properl. 这工作正常。 it gives me output as follows 它给我的输出如下

A2   A1      xx      yy
0   aaaa    29.5    33.0
1   bbbb    23.5    88.0
2   cccc    27.0    46.0

Now I want to store the mean values into the DataFrame df4 现在我要将平均值存储到DataFrame df4

How can I do it, since I want to change A3 which contain 0 in df5 based on the crosstab() ? 由于我想基于crosstab()更改df5中包含0的A3 ，该怎么办？ and I want output as follows 我想要输出如下

    A1      A2  A3  A4    
0   aaaa    xx  15  29.5    
1   aaaa    xx  44  1.0    
2   aaaa    yy  8   33.0    
3   aaaa    yy  58  33.0    
4   bbbb    xx  21  4.0    
5   bbbb    xx  26  23.5    
6   bbbb    yy  77  88.0    
7   bbbb    yy  99  88.0    
8   cccc    xx  6   5.0    
9   cccc    xx  48  2.0

Answer 1

`mask` + `groupby` + `transform` `mask` + `groupby` + `transform`

Ignoring the unnecessary reordering and removal of some rows in your desired output, you can use mask with groupby : 忽略不必要的重新排序和删除所需输出中的某些行，可以将mask与groupby一起使用：

group_mean = df4.groupby(['A1', 'A2'])['A3'].transform('mean')

df4['A4'] = df4['A4'].mask(df4['A4'] == 0, group_mean)

print(df4)

      A1  A2  A3    A4
0   cccc  xx   6   5.0
1   aaaa  yy   8  33.0
2   aaaa  xx  15  29.5
3   bbbb  xx  21   4.0
4   bbbb  xx  26  23.5
5   cccc  yy  33   2.0
6   aaaa  xx  44   1.0
7   cccc  xx  48   2.0
8   aaaa  yy  58  33.0
9   cccc  yy  59   5.0
10  bbbb  yy  77  88.0
11  bbbb  yy  99  88.0

Pandas DataFrame中的交叉表

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-11-19 15:31:16

`mask` + `groupby` + `transform` `mask` + `groupby` + `transform`

Pandas DataFrame中的交叉表

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-11-19 15:31:16

mask + groupby + transform mask + groupby + transform

解决方案1
0 已采纳 2018-11-19 15:31:16

`mask` + `groupby` + `transform` `mask` + `groupby` + `transform`