[英]Pandas pivot_table: filter on aggregate function
I am trying to pass a criteria to the aggregate function to pandas pivot_table and I am not able to figure out how to pass the criteria to the aggfunc.我正在尝试将标准传递给聚合 function 到 pandas pivot_table,但我无法弄清楚如何将标准传递给 aggfunc。 I have a data table which is converted to pandas df.我有一个转换为 pandas df 的数据表。
The input table data:输入表数据:
col1 col1 | col2 col2 | col3 col3 | col4 col4 | col5 col5 | col6 col6 | col7 col7 |
---|---|---|---|---|---|---|
1 1 | test1测试1 | t1 t1 | Dummy1假人1 | result1结果1 | 10 10 | 102.2 102.2 |
2 2 | test1测试1 | t1 t1 | Dummy2假人2 | result2结果2 | 20 20 | 101.2 101.2 |
3 3 | test1测试1 | t1 t1 | Dummy3假人3 | result3结果3 | 30 30 | 102.3 102.3 |
4 4 | test1测试1 | t1 t1 | Dummy4假人4 | result4结果4 | 40 40 | 101.4 101.4 |
5 5 | test2测试2 | t1 t1 | Dummy1假人1 | result1结果1 | 10 10 | 100 100 |
6 6 | test2测试2 | t1 t1 | Dummy2假人2 | result2结果2 | 20 20 | 103 103 |
7 7 | test2测试2 | t1 t1 | Dummy3假人3 | result3结果3 | 30 30 | 104 104 |
8 8 | test2测试2 | t1 t1 | Dummy4假人4 | result4结果4 | 40 40 | 105 105 |
9 9 | test3测试3 | t1 t1 | Dummy1假人1 | result1结果1 | 10 10 | 102 102 |
10 10 | test3测试3 | t1 t1 | Dummy2假人2 | result2结果2 | 20 20 | 87 87 |
11 11 | test3测试3 | t1 t1 | Dummy3假人3 | result3结果3 | 30 30 | 107 107 |
12 12 | test3测试3 | t1 t1 | Dummy5假人5 | result4结果4 | 50 50 | 110.2 110.2 |
13 13 | test4测试4 | t1 t1 | Dummy2假人2 | result2结果2 | 20 20 | 120 120 |
14 14 | test5测试5 | t1 t1 | Dummy6假人6 | result1结果1 | 100 100 | 88 88 |
15 15 | test1测试1 | t1 t1 | Dummy1假人1 | result2结果2 | 10 10 | 106.2 106.2 |
16 16 | test1测试1 | t1 t1 | Dummy1假人1 | result6结果6 | 10 10 | 101.1 101.1 |
I want to get the maximum on col7 data, but only when the maximum is greater than 100. If any of the col7 data is greater than the user defined criteria, then all the other columns data needs to be populated irrespective if the data met the criteria or not.我想获得 col7 数据的最大值,但仅当最大值大于 100 时。如果任何 col7 数据大于用户定义的标准,则需要填充所有其他列数据,无论数据是否满足标准与否。
Tried the following:尝试了以下方法:
columns = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7']
df = pd.DataFrame({
'col1': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16],
'col2': ['test1', 'test1', 'test1', 'test1', 'test2', 'test2', 'test2',
'test2', 'test3', 'test3', 'test3', 'test3', 'test4', 'test5',
'test1', 'test1'],
'col3': ['t1', 't1', 't1', 't1', 't1', 't1', 't1', 't1', 't1', 't1', 't1',
't1', 't1', 't1', 't1', 't1'],
'col4': ['Dummy1', 'Dummy2', 'Dummy3', 'Dummy4', 'Dummy1', 'Dummy2',
'Dummy3', 'Dummy4', 'Dummy1', 'Dummy2', 'Dummy3', 'Dummy5',
'Dummy2', 'Dummy6', 'Dummy1', 'Dummy1'],
'col5': ['result1', 'result2', 'result3', 'result4', 'result1', 'result2',
'result3', 'result4', 'result1', 'result2', 'result3', 'result4',
'result2', 'result1', 'result2', 'result6'],
'col6': [10, 20, 30, 40, 10, 20, 30, 40, 10, 20, 30, 50, 20, 100, 10, 10],
'col7': [100.2, 101.2, 102.3, 101.4, 100.0, 103.0, 104.0, 105.0, 102.0,
87.0, 107.0, 110.2, 120.0, 88.0, 106.2, 101.1]
})
res=df.pivot_table(values = 'col7', index = ['col4', 'col5', 'col6'], columns = ['col2'], fill_value = '', aggfunc = 'max' >= 100)
TypeError: '>=' not supported between instances of 'str' and 'int'
Output should look like: Output 应如下所示:
Max pivoted output without col5:不带 col5 的最大旋转 output:
col4 col4 | col6 col6 | test1测试1 | test2测试2 | test3测试3 | test4测试4 | test5测试5 |
---|---|---|---|---|---|---|
Dummy1假人1 | 10 10 | 106.2 106.2 | 100 100 | 102 102 | N/A不适用 | N/A不适用 |
Dummy2假人2 | 20 20 | 101.2 101.2 | 103 103 | 87 87 | 120 120 | N/A不适用 |
Dummy3假人3 | 30 30 | 102.3 102.3 | 104 104 | 107 107 | N/A不适用 | N/A不适用 |
Dummy4假人4 | 40 40 | 101.4 101.4 | 105 105 | N/A不适用 | N/A不适用 | N/A不适用 |
Dummy5假人5 | 50 50 | N/A不适用 | N/A不适用 | 110.2 110.2 | N/A不适用 | N/A不适用 |
Max pivoted output including col5:最大旋转 output 包括 col5:
col4 col4 | col5 col5 | col6 col6 | test1测试1 | test2测试2 | test3测试3 | test4测试4 | test5测试5 |
---|---|---|---|---|---|---|---|
Dummy1假人1 | result2结果2 | 10 10 | 106.2 106.2 | N/A不适用 | N/A不适用 | N/A不适用 | N/A不适用 |
Dummy1假人1 | result1结果1 | 10 10 | 102.2 102.2 | 100 100 | 102 102 | N/A不适用 | N/A不适用 |
Dummy2假人2 | result2结果2 | 20 20 | 101.2 101.2 | 103 103 | 87 87 | 120 120 | N/A不适用 |
Dummy3假人3 | result3结果3 | 30 30 | 102.3 102.3 | 104 104 | 107 107 | N/A不适用 | N/A不适用 |
Dummy4假人4 | result4结果4 | 40 40 | 101.4 101.4 | 105 105 | N/A不适用 | N/A不适用 | N/A不适用 |
Dummy5假人5 | result4结果4 | 50 50 | N/A不适用 | N/A不适用 | 110.2 110.2 | N/A不适用 | N/A不适用 |
Any guidance is much appreciated.非常感谢任何指导。
Thanks谢谢
You can't compare the word 'max' to 100 via >=
( aggfunc = 'max' >= 100
):您无法通过>=
( aggfunc = 'max' >= 100
) 将单词 'max' 与 100 进行比较:
I recommend not setting the fill value to a string, masking the DataFrame, to get rid of undesired rows, then replace with empty string via fillna
:我建议不要将填充值设置为字符串,屏蔽 DataFrame,以消除不需要的行,然后通过fillna
替换为空字符串:
columns = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7']
res = df.pivot_table(values='col7', index=['col4', 'col5', 'col6'],
columns=['col2'], aggfunc='max')
col2 test1 test2 test3 test4 test5
col4 col5 col6
Dummy1 result1 10 102.2 100.0 102.0 NaN NaN
result2 10 106.2 NaN NaN NaN NaN
result6 10 101.1 NaN NaN NaN NaN
Dummy2 result2 20 101.2 103.0 87.0 120.0 NaN
Dummy3 result3 30 102.3 104.0 107.0 NaN NaN
Dummy4 result4 40 101.4 105.0 NaN NaN NaN
Dummy5 result4 50 NaN NaN 110.2 NaN NaN
Dummy6 result1 100 NaN NaN NaN NaN 88.0
Mask for any values where any
res >= 100
and fillna
: any
res >= 100
和fillna
的任何值的掩码:
res = res[(res >= 100).any(1)].fillna('')
col2 test1 test2 test3 test4 test5
col4 col5 col6
Dummy1 result1 10 102.2 100.0 102.0
result2 10 106.2
result6 10 101.1
Dummy2 result2 20 101.2 103.0 87.0 120.0
Dummy3 result3 30 102.3 104.0 107.0
Dummy4 result4 40 101.4 105.0
Dummy5 result4 50 110.2
Optional reset_index
to clear the MultiIndex and rename_axis
to clear the axis name:可选的reset_index
清除 MultiIndex 和rename_axis
清除轴名称:
res[(res >= 100).any(1)].fillna('').reset_index().rename_axis(None, axis=1)
col4 col5 col6 test1 test2 test3 test4 test5
0 Dummy1 result1 10 102.2 100.0 102.0
1 Dummy1 result2 10 106.2
2 Dummy1 result6 10 101.1
3 Dummy2 result2 20 101.2 103.0 87.0 120.0
4 Dummy3 result3 30 102.3 104.0 107.0
5 Dummy4 result4 40 101.4 105.0
6 Dummy5 result4 50 110.2
Complete Working Example:完整的工作示例:
import pandas as pd
df = pd.DataFrame({
'col1': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16],
'col2': ['test1', 'test1', 'test1', 'test1', 'test2', 'test2', 'test2',
'test2', 'test3', 'test3', 'test3', 'test3', 'test4', 'test5',
'test1', 'test1'],
'col3': ['t1', 't1', 't1', 't1', 't1', 't1', 't1', 't1', 't1', 't1', 't1',
't1', 't1', 't1', 't1', 't1'],
'col4': ['Dummy1', 'Dummy2', 'Dummy3', 'Dummy4', 'Dummy1', 'Dummy2',
'Dummy3', 'Dummy4', 'Dummy1', 'Dummy2', 'Dummy3', 'Dummy5',
'Dummy2', 'Dummy6', 'Dummy1', 'Dummy1'],
'col5': ['result1', 'result2', 'result3', 'result4', 'result1', 'result2',
'result3', 'result4', 'result1', 'result2', 'result3', 'result4',
'result2', 'result1', 'result2', 'result6'],
'col6': [10, 20, 30, 40, 10, 20, 30, 40, 10, 20, 30, 50, 20, 100, 10, 10],
'col7': [102.2, 101.2, 102.3, 101.4, 100.0, 103.0, 104.0, 105.0, 102.0,
87.0, 107.0, 110.2, 120.0, 88.0, 106.2, 101.1]
})
columns = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7']
res = df.pivot_table(values='col7', index=['col4', 'col5', 'col6'],
columns=['col2'], aggfunc='max')
res = (
res[(res >= 100).any(1)].fillna('').reset_index().rename_axis(None, axis=1)
)
print(res)
To get the value without col5 remove it from the index
of the pivot_table
:要获取不带 col5 的值,请将其从pivot_table
的index
中删除:
columns = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7']
res = df.pivot_table(values='col7', index=['col4', 'col6'],
columns=['col2'], aggfunc='max')
res = (
res[(res >= 100).any(1)].fillna('').reset_index().rename_axis(None, axis=1)
)
col4 col6 test1 test2 test3 test4 test5
0 Dummy1 10 106.2 100.0 102.0
1 Dummy2 20 101.2 103.0 87.0 120.0
2 Dummy3 30 102.3 104.0 107.0
3 Dummy4 40 101.4 105.0
4 Dummy5 50 110.2
Or you can try:或者您可以尝试:
res = df.assign(col7 = df.col7.where(df.col7 > 100)).pivot_table(values='col7', index=['col4', 'col5', 'col6'],
columns=['col2'], aggfunc='max', fill_value= '')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.