繁体   English   中英

计算特定列和熊猫中每一行的非零百分比

[英]Calculate nonzeros percentage for specific columns and each row in Pandas

如果我有以下数据框:

   df = pd.DataFrame({'name':['john','mary','peter','jeff','bill','lisa','jose'], 'gender':['M','F','M','M','M','F','M'],'state':['california','dc','california','dc','california','texas','texas'],'num_children':[2,0,0,3,2,1,4],'num_pets':[5,1,0,5,2,2,3]})

    name gender       state      num_children  num_pets
0   john      M  california             2         5
1   mary      F          dc             0         1
2  peter      M  california             0         0
3   jeff      M          dc             3         5
4   bill      M  california             2         2
5   lisa      F       texas             1         2
6   jose      M       texas             4         3

我想创建一个新的行和列pct. 获取num_childrennum_pets列中零值的百分比预期输出:

    name gender       state      num_children  num_pets   pct.
0   pct.                              28.6%     14.3%     
1   john      M  california             2         5        0% 
2   mary      F          dc             0         1       50%
3  peter      M  california             0         0      100%
4   jeff      M          dc             3         5        0% 
5   bill      M  california             2         2        0%
6   lisa      F       texas             1         2        0%
7   jose      M       texas             4         3        0%

我已经计算出目标列每一行的零百分比:

df['pct'] = df[['num_children', 'num_pets']].astype(bool).sum(axis=1)/2
df['pct.'] = 1-df['pct']
del df['pct']
df['pct.'] = pd.Series(["{0:.0f}%".format(val * 100) for val in df['pct.']], index = df.index)

    name gender       state  num_children  num_pets  pct.
0   john      M  california             2         5    0%
1   mary      F          dc             0         1   50%
2  peter      M  california             0         0  100%
3   jeff      M          dc             3         5    0%
4   bill      M  california             2         2    0%
5   lisa      F       texas             1         2    0%
6   jose      M       texas             4         3    0%

但我不知道如何在pct行下面插入结果。 作为预期的输出,请帮助我以更多的Python方式获得预期的结果。 谢谢。

df[['num_children', 'num_pets']].astype(bool).sum(axis=0)/len(df.num_children)
Out[153]: 
num_children    0.714286
num_pets        0.857143
dtype: float64

更新:同样的事情,但是为了求和,非常感谢@jezrael:

df['sums'] = df[['num_children', 'num_pets']].sum(axis=1)
df1 = (df[['num_children', 'num_pets']].sum()
                                       .to_frame()
                                       .T
                                       .assign(name='sums'))

df = pd.concat([df1.reindex(columns=df.columns, fill_value=''), df], 
                ignore_index=True, sort=False)
print (df)
    name gender       state  num_children  num_pets sums
0   sums                               12        18    
1   john      M  california             2         5   7
2   mary      F          dc             0         1   1
3  peter      M  california             0         0   0
4   jeff      M          dc             3         5   8
5   bill      M  california             2         2   4
6   lisa      F       texas             1         2   3
7   jose      M       texas             4         3   7

通过比较DataFrame.eq 0值,可以将mean与布尔值掩码DataFrame.eq ,因为sum/len=mean按定义,乘以100并用apply加上百分比:

s = df[['num_children', 'num_pets']].eq(0).mean(axis=1)
df['pct'] = s.mul(100).apply("{0:.0f}%".format)

对于第一行, DataFrame使用与原始列和concat相同的列创建新的DataFrame

df1 = (df[['num_children', 'num_pets']].eq(0)
                                       .mean()
                                       .mul(100)
                                       .apply("{0:.1f}%".format)
                                       .to_frame()
                                       .T
                                       .assign(name='pct.'))

df = pd.concat([df1.reindex(columns=df.columns, fill_value=''), df], 
                ignore_index=True, sort=False)
print (df)
    name gender       state num_children num_pets   pct
0   pct.                           28.6%    14.3%      
1   john      M  california            2        5    0%
2   mary      F          dc            0        1   50%
3  peter      M  california            0        0  100%
4   jeff      M          dc            3        5    0%
5   bill      M  california            2        2    0%
6   lisa      F       texas            1        2    0%
7   jose      M       texas            4        3    0%

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM