通过将多列分组来过滤python pandas dataframe

Question

This is a little hard to explain so bear with me please. 这有点难以解释，所以请多多包涵。

Assume I have a table, like below 假设我有一张桌子，如下所示

How can I create a new dataframe, that matches criteria below 如何创建符合以下条件的新数据框

Has 5 rows, for each row, will be values from Column A that between a range, say that first row are between (200, 311), second row between (312, 370) etc. 每行有5行，将是A列中介于某个范围之间的值，例如第一行在（200，311）之间，第二行在（312，370）之间，等等。
Has 3 columns, for each column, will be values from Column B that between a range, say that first column are between (1, 16), second column between (17, 50) etc. 每列有3列，将是B列中某个范围之间的值，例如第一列在（1，16）之间，第二列在（17，50）之间，依此类推。
Value of each cell, will be sum of values from Column C which matches corresponding Column and Row. 每个单元格的值将是来自列C的值的总和，该值与对应的列和行匹配。

Example: 例：

Any illustration? 有插图吗？ Numbers are random, you don't need to follow my example. 数字是随机的，您无需遵循我的示例。

Thanks a lot! 非常感谢！

My solution was pre-define row criteria and column criteria in two lists, then run embedded loops to fill each cell value into new dataframe. 我的解决方案是在两个列表中预定义行条件和列条件，然后运行嵌入式循环以将每个单元格值填充到新数据框中。 It works and not that slow, but I am wondering since this is pandas dataframe, there should be a way doing so in query, without any loop. 它的工作原理并没有那么慢，但是我想知道既然是pandas数据框，应该有一种在查询中这样做的方式，没有任何循环。

Thanks again! 再次感谢！

Answer 1

You can use cut to get your ranges, and then supply them to pivot_table to get the sums: 您可以使用cut来获取范围，然后将其提供给pivot_table以获取总和：

# Setup example data.
np.random.seed([3, 1415])
n = 100
df = pd.DataFrame({
    'A': np.random.randint(200, 601, size=n),
    'B': np.random.randint(1, 101, size=n),
    'C': np.random.randint(25, size=n)
    })

# Use cut to get the ranges.
a_bins = pd.cut(df['A'], bins=[200, 311, 370, 450, 550, 600], include_lowest=True)
b_bins = pd.cut(df['B'], bins=[1, 16, 67, 100], include_lowest=True)

# Pivot to get the sums.
df2 = df.pivot_table(index=a_bins, columns=b_bins, values='C', aggfunc='sum', fill_value=0)

The resulting output: 结果输出：

B           [1, 16]  (16, 67]  (67, 100]
A                                       
[200, 311]       82       118        153
(311, 370]       68        56         45
(370, 450]       41       129         40
(450, 550]       32       121         57
(550, 600]        0       112         47

Answer 2

I really like @root's solution ! 我真的很喜欢@root的解决方案！ Here is a slightly modified one-liner version, which uses pd.crosstab method: 这是一个稍作修改的单线版本，它使用pd.crosstab方法：

In [102]: pd.crosstab(
     ...:     pd.cut(df['A'], bins=[200, 311, 370, 450, 550, 600], include_lowest=True),
     ...:     pd.cut(df['B'], bins=[1, 16, 67, 100], include_lowest=True),
     ...:     df['C'],
     ...:     aggfunc='sum'
     ...: )
     ...:
Out[102]:
B           [1, 16]  (16, 67]  (67, 100]
A
[200, 311]       31       157        117
(311, 370]       23        90         38
(370, 450]      110       168         60
(450, 550]       37       117        115
(550, 600]       35        19         49

通过将多列分组来过滤python pandas dataframe

问题描述

2 个解决方案

解决方案1
3 已采纳 2017-03-27 19:48:14

解决方案2
1 2017-03-27 20:46:34

通过将多列分组来过滤python pandas dataframe

问题描述

2 个解决方案

解决方案1 3 已采纳 2017-03-27 19:48:14

解决方案2 1 2017-03-27 20:46:34

解决方案1
3 已采纳 2017-03-27 19:48:14

解决方案2
1 2017-03-27 20:46:34