[英]How to count values for columns in a groupby dataframe?
You are trying to use the values of Time
as the new columns and the values of the other columns as new index.您正在尝试将
Time
的值用作新列,并将其他列的值用作新索引。 If you had just a few columns, this could be easily achieved by a pivot_table
.如果您只有几列,则可以通过
pivot_table
轻松实现。 Eg for 'Health'
:例如对于
'Health'
:
In [2]: df = pd.DataFrame([['T0', 'Yes', 'Good'], ['T0', 'Yes', 'Bad'], ['T1', 'No', 'Good'], ['T1', 'No', 'Good']], columns=['Time', 'Health', 'Meds'])
In [18]: pd.pivot_table(df[['Health', 'Time']], index='Health', columns='Time', aggfunc='size', fill_value=0)
Out[18]:
Time T0 T1
Health
No 0 2
Yes 2 0
However, you want to repeat that procedure for all columns.但是,您希望对所有列重复该过程。 This is also possible with a pivot table, given you reshape your dataframe into a long data format.
如果您将数据框重新整形为长数据格式,那么使用数据透视表也可以做到这一点。 This means you create a new column with all column names and is exactly what the
stack()
function is for:这意味着您创建一个包含所有列名的新列,这正是
stack()
函数的用途:
In [45]: df_stacked = df.set_index('Time').stack().rename('value').reset_index()
Out[45]:
Time level_1 value
0 T0 Health Yes
1 T0 Meds Good
2 T0 Health Yes
3 T0 Meds Bad
4 T1 Health No
5 T1 Meds Good
6 T1 Health No
7 T1 Meds Good
Now you can pivot the stacked dataframe in order to use both, the values of the column with all column names and their values as index现在您可以旋转堆叠的数据框以同时使用具有所有列名的列的值及其作为索引的值
In [48]: pd.pivot_table(df_stacked, index=['level_1', 'value'], columns='Time', aggfunc='size', fill_value=0)
Out[48]:
Time T0 T1
level_1 value
Health No 0 2
Yes 2 0
Meds Bad 1 0
Good 1 2
The description is probably a bit confusing but I hope the code makes it clear.描述可能有点混乱,但我希望代码能说清楚。 You basically had the right ingredients but the combination is a bit tricky.
你基本上有正确的成分,但组合有点棘手。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.