简体   繁体   English

如何计算groupby数据框中列的值?

[英]How to count values for columns in a groupby dataframe?

So I'm having trouble with this one:所以我遇到了这个问题:

在此处输入图片说明

Ive tried this, but it stacks time:我试过这个,但它会增加时间:

df2 = df.groupby(['Time'])
for group, data in df2:
    result =  data.apply(lambda x: x.value_counts()).T.stack()
    print(result)

You are trying to use the values of Time as the new columns and the values of the other columns as new index.您正在尝试将Time的值用作新列,并将其他列的值用作新索引。 If you had just a few columns, this could be easily achieved by a pivot_table .如果您只有几列,则可以通过pivot_table轻松实现。 Eg for 'Health' :例如对于'Health'

In [2]: df = pd.DataFrame([['T0', 'Yes', 'Good'], ['T0', 'Yes', 'Bad'], ['T1', 'No', 'Good'], ['T1', 'No', 'Good']], columns=['Time', 'Health', 'Meds'])    

In [18]: pd.pivot_table(df[['Health', 'Time']], index='Health', columns='Time', aggfunc='size', fill_value=0)                                                                                                      
Out[18]: 
Time    T0  T1
Health        
No       0   2
Yes      2   0

However, you want to repeat that procedure for all columns.但是,您希望对所有列重复该过程。 This is also possible with a pivot table, given you reshape your dataframe into a long data format.如果您将数据框重新整形为长数据格式,那么使用数据透视表也可以做到这一点。 This means you create a new column with all column names and is exactly what the stack() function is for:这意味着您创建一个包含所有列名的新列,这正是stack()函数的用途:

In [45]: df_stacked = df.set_index('Time').stack().rename('value').reset_index()                                                                                                                                                
Out[45]: 
  Time level_1 value
0   T0  Health   Yes
1   T0    Meds  Good
2   T0  Health   Yes
3   T0    Meds   Bad
4   T1  Health    No
5   T1    Meds  Good
6   T1  Health    No
7   T1    Meds  Good

Now you can pivot the stacked dataframe in order to use both, the values of the column with all column names and their values as index现在您可以旋转堆叠的数据框以同时使用具有所有列名的列的值及其作为索引的值

In [48]: pd.pivot_table(df_stacked, index=['level_1', 'value'], columns='Time', aggfunc='size', fill_value=0)                                                                                                      
Out[48]: 
Time           T0  T1
level_1 value        
Health  No      0   2
        Yes     2   0
Meds    Bad     1   0
        Good    1   2

The description is probably a bit confusing but I hope the code makes it clear.描述可能有点混乱,但我希望代码能说清楚。 You basically had the right ingredients but the combination is a bit tricky.你基本上有正确的成分,但组合有点棘手。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM