简体   繁体   English

Pandas Dataframe 按列索引范围对列进行分组并获取组总和

[英]Pandas Dataframe grouping columns by column index ranges and get group sums

I have a dataframe, formed of survey responses the column headers are scores and the number row beneath is counts of responses for those values我有一个 dataframe,由调查响应组成,列标题是分数,下面的数字行是这些值的响应计数

index          |  1  |  2 |  3 |  4 |  5  |  6 |  7 |  8 |  9  | 10 | 11 |
--------------------------------------------------------------------------
Business unit  | 100 | 50 | 25 | 50 | 100 | 60 | 80 | 75 | 100 | 50 | 50 |

How do I group these columns and retain the counts per the below如何对这些列进行分组并保留以下计数

index          | <=6 | 7=> |
--------------------------------
Business unit  | 385 | 355 |

If you dataframe looks like this:如果您的 dataframe 看起来像这样:

    1   2   3   4    5   6   7   8    9   10  11
0  100  50  25  50  100  60  80  75  100  50  50

You can use either pd.cut or np.digitize to bin your columns and operate on them with groupby(..., axis=1) to group horizontally instead of vertically.您可以使用pd.cutnp.digitize对列进行合并,并使用groupby(..., axis=1)对它们进行操作以水平而不是垂直分组。

Using pd.cut使用pd.cut

bins = pd.cut(df.columns, [0, 6, 11], labels=["<=6", ">=7"])
summary_df = df.groupby(bins, axis=1).sum()

print(summary_df)
   <=6  >=7
0  385  355

Using np.digitize使用np.digitize

bins = np.digitize(df.columns, [6], right=True)  # np.digitize is not aware of labels
summary_df = (
    df.groupby(bins, axis=1)
    .sum()
    .rename(columns={0: "<=6", 1: ">=7"})  # add our labels
)

print(summary_df)
   <=6  >=7
0  385  355

You can use .groupby() on boolean index (created by comparing column label values with certain value eg 7) and use GroupBy.sum() to sum up the columns content values.您可以在 boolean 索引上使用.groupby() (通过将列 label 的值与某个值(例如 7 )进行比较来创建)并使用GroupBy.sum()来总结列的内容值。 Finally, we rename the resulting dataframe column labels, as follows:最后,我们将生成的 dataframe 列标签重命名,如下:

df.groupby(df.columns >= 7, axis=1).sum().rename({True: '>=7', False: '<=6'}, axis=1)

Output: Output:

   <=6  >=7
0  385  355

Test Data Preparation:测试数据准备:

data = {1: [100], 2: [50], 3: [25], 4: [50], 5: [100], 6: [60], 7: [80], 8: [75], 9: [100], 10: [50], 11: [50]}
df = pd.DataFrame(data)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM