[英]Python - Group by with multiple conditions on columns
I have the following example dataframe:我有以下示例数据框:
data = {'ref':['1', '2', '3', '4', '5'],
'checked':[True, True, True, False, True],
'rag':['r', 'r', 'g', 'a', 'r'],
'group':['high', 'low', 'high', 'medium', 'high']}
dataframe = pd.DataFrame(data)
I want to group on group
and do some conditional counts where certain conditions are met so I get the following:我想group
并在满足某些条件的情况下进行一些条件计数,因此我得到以下信息:
I can group by group
and do n
by the following:我n
按group
并按以下方式进行操作:
df = dataframe.groupby(['group']).agg(
n=('ref', 'count')
).reset_index()
But I am struggling to also count the number of times for each group that:但我也在努力计算每个组的次数:
Any help would be much appreciated!任何帮助将非常感激!
edit: changed True/False strings to Boolean编辑:将 True/False 字符串更改为 Boolean
You have a few challenges.你有一些挑战。
For instance, your True/False are strings, so you should either initialize them as booleans and use sum
or convert to boolean during aggregation.例如,您的 True/False 是字符串,因此您应该将它们初始化为布尔值并在聚合期间使用sum
或转换为布尔值。
To count the rag, it's easier to use pandas.crosstab
and join it while you still have the groups as index.要计算破布,使用pandas.crosstab
并加入它更容易,同时您仍然将组作为索引。
df = (dataframe
.groupby(['group'])
.agg(**{'n': ('ref', 'count'),
'checked=True': ('checked', lambda s: s.eq('True').sum()),
})
.join(pd.crosstab(dataframe['group'], dataframe['rag'])
.add_prefix('rag=')
)
.reset_index()
)
output:输出:
group n checked=True rag=a rag=g rag=r
0 high 3 3 0 1 2
1 low 1 1 0 0 1
2 medium 1 0 1 0 0
You can try pivot_table
separately on your checked
and rag
columns您可以在已checked
和rag
列上分别尝试pivot_table
n = df.groupby(['group']).agg(n=('ref', 'count'))
dfs = []
for column in ['checked', 'rag']:
df_ = (df.pivot_table(index='group', columns=[column], values='ref',
aggfunc='count', fill_value=0)
.rename(columns=lambda col: f'{column}={col}')
.rename_axis(None, axis=1))
dfs.append(df_)
df = pd.concat(dfs, axis=1).drop('checked=False', axis=1)
print(n.join(df))
n checked=True rag=a rag=g rag=r
group
high 3 3 0 1 2
low 1 1 0 0 1
medium 1 0 1 0 0
There is an issue with your data example:您的数据示例存在问题:
data = {'ref':['1', '2', '3', '4', '5'],
'checked':[True, True, True, False, True],
'rag':['r', 'r', 'g', 'a', 'r'],
'group':['high', 'low', 'high', 'medium', 'high']}
df = pd.DataFrame(data)
for checked
column, you should enter value as True/False
without in quotation.对于checked
的列,您应该在不带引号的情况下将值输入为True/False
。 Otherwise, python will interpret True'/'Fasle'
as string
.否则,python 会将True'/'Fasle'
为string
。
The idea here is two steps: (1) you use iteration
with groupby
.这里的想法是两个步骤:(1)您将iteration
与groupby
一起使用。 (2) then you merge
/ concat
them into 1 table: (2) 然后将它们merge
/ concat
到 1 个表中:
# Create empty table
table = pd.DataFrame()
# List of column you want to iterate:
col_iter = ['checked', 'rag']
# Iterate:
for col in col_iter:
# Obtain unique values in each col used
uni = df[col].unique()
# Iterate for each unique value in col.
# Set tem var
# Concat tem to table
for val in uni:
tem = df.groupby('group').apply(lambda g: (g[col]==val).sum())
table = pd.concat([table, tem], axis=1).rename(columns={0:f'{col}={val}'})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.