计算特定列和熊猫中每一行的非零百分比

Question

如果我有以下数据框：

   df = pd.DataFrame({'name':['john','mary','peter','jeff','bill','lisa','jose'], 'gender':['M','F','M','M','M','F','M'],'state':['california','dc','california','dc','california','texas','texas'],'num_children':[2,0,0,3,2,1,4],'num_pets':[5,1,0,5,2,2,3]})

    name gender       state      num_children  num_pets
0   john      M  california             2         5
1   mary      F          dc             0         1
2  peter      M  california             0         0
3   jeff      M          dc             3         5
4   bill      M  california             2         2
5   lisa      F       texas             1         2
6   jose      M       texas             4         3

我想创建一个新的行和列pct. 获取num_children和num_pets列中零值的百分比预期输出：

    name gender       state      num_children  num_pets   pct.
0   pct.                              28.6%     14.3%     
1   john      M  california             2         5        0% 
2   mary      F          dc             0         1       50%
3  peter      M  california             0         0      100%
4   jeff      M          dc             3         5        0% 
5   bill      M  california             2         2        0%
6   lisa      F       texas             1         2        0%
7   jose      M       texas             4         3        0%

我已经计算出目标列每一行的零百分比：

df['pct'] = df[['num_children', 'num_pets']].astype(bool).sum(axis=1)/2
df['pct.'] = 1-df['pct']
del df['pct']
df['pct.'] = pd.Series(["{0:.0f}%".format(val * 100) for val in df['pct.']], index = df.index)

    name gender       state  num_children  num_pets  pct.
0   john      M  california             2         5    0%
1   mary      F          dc             0         1   50%
2  peter      M  california             0         0  100%
3   jeff      M          dc             3         5    0%
4   bill      M  california             2         2    0%
5   lisa      F       texas             1         2    0%
6   jose      M       texas             4         3    0%

但我不知道如何在pct行下面插入结果。 作为预期的输出，请帮助我以更多的Python方式获得预期的结果。 谢谢。

df[['num_children', 'num_pets']].astype(bool).sum(axis=0)/len(df.num_children)
Out[153]: 
num_children    0.714286
num_pets        0.857143
dtype: float64

更新：同样的事情，但是为了求和，非常感谢@jezrael：

df['sums'] = df[['num_children', 'num_pets']].sum(axis=1)
df1 = (df[['num_children', 'num_pets']].sum()
                                       .to_frame()
                                       .T
                                       .assign(name='sums'))

df = pd.concat([df1.reindex(columns=df.columns, fill_value=''), df], 
                ignore_index=True, sort=False)
print (df)
    name gender       state  num_children  num_pets sums
0   sums                               12        18    
1   john      M  california             2         5   7
2   mary      F          dc             0         1   1
3  peter      M  california             0         0   0
4   jeff      M          dc             3         5   8
5   bill      M  california             2         2   4
6   lisa      F       texas             1         2   3
7   jose      M       texas             4         3   7

Answer 1

通过比较DataFrame.eq 0值，可以将mean与布尔值掩码DataFrame.eq ，因为sum/len=mean按定义，乘以100并用apply加上百分比：

s = df[['num_children', 'num_pets']].eq(0).mean(axis=1)
df['pct'] = s.mul(100).apply("{0:.0f}%".format)

对于第一行， DataFrame使用与原始列和concat相同的列创建新的DataFrame ：

df1 = (df[['num_children', 'num_pets']].eq(0)
                                       .mean()
                                       .mul(100)
                                       .apply("{0:.1f}%".format)
                                       .to_frame()
                                       .T
                                       .assign(name='pct.'))

df = pd.concat([df1.reindex(columns=df.columns, fill_value=''), df], 
                ignore_index=True, sort=False)
print (df)
    name gender       state num_children num_pets   pct
0   pct.                           28.6%    14.3%      
1   john      M  california            2        5    0%
2   mary      F          dc            0        1   50%
3  peter      M  california            0        0  100%
4   jeff      M          dc            3        5    0%
5   bill      M  california            2        2    0%
6   lisa      F       texas            1        2    0%
7   jose      M       texas            4        3    0%

计算特定列和熊猫中每一行的非零百分比

问题描述

1 个解决方案

解决方案1
4 已采纳 2019-03-06 11:16:02

计算特定列和熊猫中每一行的非零百分比

问题描述

1 个解决方案

解决方案1 4 已采纳 2019-03-06 11:16:02

解决方案1
4 已采纳 2019-03-06 11:16:02