繁体   English   中英

熊猫:计算许多列的百分比值

[英]Pandas: compute numerous columns of percentage values

我无法遍历select dataframe列的值以创建表示百分比值的新列。 可重现的示例:

    data = {'Respondents': [90, 43, 89, '89', '67', '88', '73', '78', '62', '101'],
        'answer_1': [51, 15, 15, 61, 16, 14, 15, 1, 0, 16], 
        'answer_2': [11, 12, 14, 40, 36, 78, 12, 0, 26, 78],
        'answer_3': [3, 8, 4, 0, 2, 7, 10, 11, 6, 7]}
df = pd.DataFrame(data)
df

    Respondents  answer_1   answer_2   answer_3
0   90           51         11         3
1   43           15         12         8
2   89           15         14         4
3   89           61         35         0
4   67           16         36         2
5   88           14         78         7
6   73           15         12         10
7   78           1          0          11
8   62           0          26         6
9   101          16         78         7

目的是计算每个答案列相对于总答复者的百分比。 例如,对于新的answer_1列-我们将其命名为answer_1_perc第一个值为46(因为51是90的46%),下一个值为35(15是43的35%)。 然后将有answer_2_percanswer_3_perc列。

我写了很多下面代码的迭代,这真是令人头晕。

for columns in df.iloc[:, 1:4]:
for i in columns: 
    i_name = 'percentage_' + str(columns)
    i_group = ([i] / df['Respondents'] * 100)
    df[i_name] = i_group

做这个的最好方式是什么? 我需要使用迭代方法,因为我的实际数据有25个答案列,而不是本示例中显示的3个答案列。

差不多用完了,请注意,您在受访者col中有字符串值,在调用以下命令之前,我已对其进行更正:

In [172]:

for col in df.columns[1:4]:
    i_name = 'percentage_' + col
    i_group = (df[col] / df['Respondents']) * 100
    df[i_name] = i_group

df
Out[172]:
   Respondents  answer_1  answer_2  answer_3  percentage_answer_1  \
0           90        51        11         3            56.666667   
1           43        15        12         8            34.883721   
2           89        15        14         4            16.853933   
3           89        61        40         0            68.539326   
4           67        16        36         2            23.880597   
5           88        14        78         7            15.909091   
6           73        15        12        10            20.547945   
7           78         1         0        11             1.282051   
8           62         0        26         6             0.000000   
9          101        16        78         7            15.841584   

   percentage_answer_2  percentage_answer_3  
0            12.222222             3.333333  
1            27.906977            18.604651  
2            15.730337             4.494382  
3            44.943820             0.000000  
4            53.731343             2.985075  
5            88.636364             7.954545  
6            16.438356            13.698630  
7             0.000000            14.102564  
8            41.935484             9.677419  
9            77.227723             6.930693  

我建议使用div和concat:

df['Respondents'] = df['Respondents'].astype(float)
df_pct = (df.drop('Respondents', axis=1)
            .div(df['Respondents'], axis=0)
            .mul(100)
            .rename(columns=lambda col: 'percentage_' + col)
          )
pd.concat([df, df_pct], axis=1)

   Respondents  answer_1  answer_2  answer_3  percentage_answer_1  \
0         90.0        51        11         3            56.666667   
1         43.0        15        12         8            34.883721   
2         89.0        15        14         4            16.853933   
3         89.0        61        40         0            68.539326   
4         67.0        16        36         2            23.880597   
5         88.0        14        78         7            15.909091   
6         73.0        15        12        10            20.547945   
7         78.0         1         0        11             1.282051   
8         62.0         0        26         6             0.000000   
9        101.0        16        78         7            15.841584   

   percentage_answer_2  percentage_answer_3  
0            12.222222             3.333333  
1            27.906977            18.604651  
2            15.730337             4.494382  
3            44.943820             0.000000  
4            53.731343             2.985075  
5            88.636364             7.954545  
6            16.438356            13.698630  
7             0.000000            14.102564  
8            41.935484             9.677419  
9            77.227723             6.930693  

另一种解决方案是使用div期望的列按列的Respondents ,然后添加到新的列名称中:

print  ('percentage_' + df.columns[1:4])
Index(['percentage_answer_1', 'percentage_answer_2', 'percentage_answer_3'], dtype='object')

df['percentage_' + df.columns[1:4]] = df.ix[:,1:4].div(df.Respondents, axis=0) * 100
print (df)
   Respondents  answer_1  answer_2  answer_3  percentage_answer_1  \
0           90        51        11         3            56.666667   
1           43        15        12         8            34.883721   
2           89        15        14         4            16.853933   
3           89        61        40         0            68.539326   
4           67        16        36         2            23.880597   
5           88        14        78         7            15.909091   
6           73        15        12        10            20.547945   
7           78         1         0        11             1.282051   
8           62         0        26         6             0.000000   
9          101        16        78         7            15.841584   

   percentage_answer_2  percentage_answer_3  
0            12.222222             3.333333  
1            27.906977            18.604651  
2            15.730337             4.494382  
3            44.943820             0.000000  
4            53.731343             2.985075  
5            88.636364             7.954545  
6            16.438356            13.698630  
7             0.000000            14.102564  
8            41.935484             9.677419  
9            77.227723             6.930693  

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM