This is a little hard to explain so bear with me please.
Assume I have a table, like below
How can I create a new dataframe, that matches criteria below
Has 5 rows, for each row, will be values from Column A that between a range, say that first row are between (200, 311), second row between (312, 370) etc.
Has 3 columns, for each column, will be values from Column B that between a range, say that first column are between (1, 16), second column between (17, 50) etc.
Value of each cell, will be sum of values from Column C which matches corresponding Column and Row.
Example:
Any illustration? Numbers are random, you don't need to follow my example.
Thanks a lot!
My solution was pre-define row criteria and column criteria in two lists, then run embedded loops to fill each cell value into new dataframe. It works and not that slow, but I am wondering since this is pandas dataframe, there should be a way doing so in query, without any loop.
Thanks again!
You can use cut
to get your ranges, and then supply them to pivot_table
to get the sums:
# Setup example data.
np.random.seed([3, 1415])
n = 100
df = pd.DataFrame({
'A': np.random.randint(200, 601, size=n),
'B': np.random.randint(1, 101, size=n),
'C': np.random.randint(25, size=n)
})
# Use cut to get the ranges.
a_bins = pd.cut(df['A'], bins=[200, 311, 370, 450, 550, 600], include_lowest=True)
b_bins = pd.cut(df['B'], bins=[1, 16, 67, 100], include_lowest=True)
# Pivot to get the sums.
df2 = df.pivot_table(index=a_bins, columns=b_bins, values='C', aggfunc='sum', fill_value=0)
The resulting output:
B [1, 16] (16, 67] (67, 100]
A
[200, 311] 82 118 153
(311, 370] 68 56 45
(370, 450] 41 129 40
(450, 550] 32 121 57
(550, 600] 0 112 47
I really like @root's solution ! Here is a slightly modified one-liner version, which uses pd.crosstab method:
In [102]: pd.crosstab(
...: pd.cut(df['A'], bins=[200, 311, 370, 450, 550, 600], include_lowest=True),
...: pd.cut(df['B'], bins=[1, 16, 67, 100], include_lowest=True),
...: df['C'],
...: aggfunc='sum'
...: )
...:
Out[102]:
B [1, 16] (16, 67] (67, 100]
A
[200, 311] 31 157 117
(311, 370] 23 90 38
(370, 450] 110 168 60
(450, 550] 37 117 115
(550, 600] 35 19 49
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.