[英]python pandas dataframe aggregate rows
I have a dataframe like so.我有一个像这样的 dataframe。
id K V
0 1 k1 3
1 1 k2 4
2 1 k2 5
3 1 k1 5
4 2 k1 2
5 2 k1 3
6 2 k2 3
And i also have a set of conditions like k1 > 1 and k2 < 4.而且我还有一组条件,例如 k1 > 1 和 k2 < 4。
I want to process the conditions and create a new dataframe containing 1 row per id and columns for each conditions.我想处理条件并创建一个新的 dataframe 每个 id 包含 1 行和每个条件的列。
id k1_condition k2_condition
0 1 True False
1 2 True True
Try the following尝试以下
# function to be applied to the column 'V 'each (id, ki) group
def conditions(g):
cond_dict = {
'k1': lambda k1: k1 > 1,
'k2': lambda k2: k2 < 4
}
_ , k = g.name # g.name = group key = (id, ki)
return cond_dict[k](g).all()
out = (
df.groupby(['id', 'K'])['V']
.apply(conditions)
.unstack('K') # turn k1 and k2 into columns
.add_suffix('_cond') # add suffix to column names: ki --> ki_cond
.rename_axis(columns=None) # remove the column axis label (K)
.reset_index() # make id a column, not the index
)
Output: Output:
>>> out
id k1_cond k2_cond
0 1 True False
1 2 True True
You can easily add more conditions to the cond_dict
if the column K
contains other values besides k1
and k2
.如果列K
包含除k1
和k2
之外的其他值,您可以轻松地向cond_dict
添加更多条件。
Dataframe.apply should work: Dataframe.apply 应该可以工作:
df["k1_condition"] = df.apply(lambda x: x["K"]=="k1" & x["V"]>1, axis=1)
df["k2_condition"] = df.apply(lambda x: x["K"]=="k2" & x["V"]>4, axis=1)
df2 = df[["id", "k1_condition", "k2_condition"]].groupy("id").any()
here is one way to do it这是一种方法
df2=df.pivot_table(index='id',columns='K', values='V', aggfunc=['min','max']).reset_index()
df2.columns = ['_'.join(col) for col in df2.columns ]
df2['k1_condition'] = df2['min_k1'] > 1
df2['k2_condition'] = df2['max_k2'] <4
df2=df2.drop(columns=['min_k1','min_k2','max_k1','max_k2'])
df2
OR或者
df2=df.pivot_table(index='id',columns='K', values='V', aggfunc=['min','max']).reset_index()
df2['k1_condition'] = df2['min']['k1'] > 1
df2['k2_condition'] = df2['max']['k2'] <4
df2.drop(columns=['min','max'],level=0,inplace=True)
df2
id_ k1_condition k2_condition
0 1 True False
1 2 True True
You could use pivot_table with a conditions function.您可以使用带有条件 function 的 pivot_table。
def conditions(x):
k = df.at[x.index[0],'K']
if k == 'k1':
return (x>1).all()
return (x<4).all()
pd.pivot_table(df, index='id', columns='K', aggfunc=conditions) \
.droplevel(level=0, axis=1).add_suffix('_condition') \
.rename_axis(None, axis=1).reset_index()
Result结果
id k1_conditon k2_condition
0 1 True False
1 2 True True
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.