[英]Pandas: compute numerous columns of percentage values
我无法遍历select dataframe列的值以创建表示百分比值的新列。 可重现的示例:
data = {'Respondents': [90, 43, 89, '89', '67', '88', '73', '78', '62', '101'],
'answer_1': [51, 15, 15, 61, 16, 14, 15, 1, 0, 16],
'answer_2': [11, 12, 14, 40, 36, 78, 12, 0, 26, 78],
'answer_3': [3, 8, 4, 0, 2, 7, 10, 11, 6, 7]}
df = pd.DataFrame(data)
df
Respondents answer_1 answer_2 answer_3
0 90 51 11 3
1 43 15 12 8
2 89 15 14 4
3 89 61 35 0
4 67 16 36 2
5 88 14 78 7
6 73 15 12 10
7 78 1 0 11
8 62 0 26 6
9 101 16 78 7
目的是计算每个答案列相对于总答复者的百分比。 例如,对于新的answer_1
列-我们将其命名为answer_1_perc
第一个值为46(因为51是90的46%),下一个值为35(15是43的35%)。 然后将有answer_2_perc
和answer_3_perc
列。
我写了很多下面代码的迭代,这真是令人头晕。
for columns in df.iloc[:, 1:4]:
for i in columns:
i_name = 'percentage_' + str(columns)
i_group = ([i] / df['Respondents'] * 100)
df[i_name] = i_group
做这个的最好方式是什么? 我需要使用迭代方法,因为我的实际数据有25个答案列,而不是本示例中显示的3个答案列。
差不多用完了,请注意,您在受访者col中有字符串值,在调用以下命令之前,我已对其进行更正:
In [172]:
for col in df.columns[1:4]:
i_name = 'percentage_' + col
i_group = (df[col] / df['Respondents']) * 100
df[i_name] = i_group
df
Out[172]:
Respondents answer_1 answer_2 answer_3 percentage_answer_1 \
0 90 51 11 3 56.666667
1 43 15 12 8 34.883721
2 89 15 14 4 16.853933
3 89 61 40 0 68.539326
4 67 16 36 2 23.880597
5 88 14 78 7 15.909091
6 73 15 12 10 20.547945
7 78 1 0 11 1.282051
8 62 0 26 6 0.000000
9 101 16 78 7 15.841584
percentage_answer_2 percentage_answer_3
0 12.222222 3.333333
1 27.906977 18.604651
2 15.730337 4.494382
3 44.943820 0.000000
4 53.731343 2.985075
5 88.636364 7.954545
6 16.438356 13.698630
7 0.000000 14.102564
8 41.935484 9.677419
9 77.227723 6.930693
我建议使用div和concat:
df['Respondents'] = df['Respondents'].astype(float)
df_pct = (df.drop('Respondents', axis=1)
.div(df['Respondents'], axis=0)
.mul(100)
.rename(columns=lambda col: 'percentage_' + col)
)
pd.concat([df, df_pct], axis=1)
Respondents answer_1 answer_2 answer_3 percentage_answer_1 \
0 90.0 51 11 3 56.666667
1 43.0 15 12 8 34.883721
2 89.0 15 14 4 16.853933
3 89.0 61 40 0 68.539326
4 67.0 16 36 2 23.880597
5 88.0 14 78 7 15.909091
6 73.0 15 12 10 20.547945
7 78.0 1 0 11 1.282051
8 62.0 0 26 6 0.000000
9 101.0 16 78 7 15.841584
percentage_answer_2 percentage_answer_3
0 12.222222 3.333333
1 27.906977 18.604651
2 15.730337 4.494382
3 44.943820 0.000000
4 53.731343 2.985075
5 88.636364 7.954545
6 16.438356 13.698630
7 0.000000 14.102564
8 41.935484 9.677419
9 77.227723 6.930693
另一种解决方案是使用div
期望的列按列的Respondents
,然后添加到新的列名称中:
print ('percentage_' + df.columns[1:4])
Index(['percentage_answer_1', 'percentage_answer_2', 'percentage_answer_3'], dtype='object')
df['percentage_' + df.columns[1:4]] = df.ix[:,1:4].div(df.Respondents, axis=0) * 100
print (df)
Respondents answer_1 answer_2 answer_3 percentage_answer_1 \
0 90 51 11 3 56.666667
1 43 15 12 8 34.883721
2 89 15 14 4 16.853933
3 89 61 40 0 68.539326
4 67 16 36 2 23.880597
5 88 14 78 7 15.909091
6 73 15 12 10 20.547945
7 78 1 0 11 1.282051
8 62 0 26 6 0.000000
9 101 16 78 7 15.841584
percentage_answer_2 percentage_answer_3
0 12.222222 3.333333
1 27.906977 18.604651
2 15.730337 4.494382
3 44.943820 0.000000
4 53.731343 2.985075
5 88.636364 7.954545
6 16.438356 13.698630
7 0.000000 14.102564
8 41.935484 9.677419
9 77.227723 6.930693
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.