简体   繁体   中英

python pandas dataframe aggregate rows

I have a dataframe like so.

   id   K  V
0   1  k1  3
1   1  k2  4
2   1  k2  5
3   1  k1  5
4   2  k1  2
5   2  k1  3
6   2  k2  3

And i also have a set of conditions like k1 > 1 and k2 < 4.

I want to process the conditions and create a new dataframe containing 1 row per id and columns for each conditions.

   id  k1_condition  k2_condition
0   1  True          False
1   2  True          True

Try the following

# function to be applied to the column 'V 'each (id, ki) group
def conditions(g):
    cond_dict = {
        'k1': lambda k1: k1 > 1,
        'k2': lambda k2: k2 < 4
    }
    _ , k = g.name  # g.name = group key = (id, ki)
    return cond_dict[k](g).all()

out = (
    df.groupby(['id', 'K'])['V']
      .apply(conditions) 
      .unstack('K') # turn k1 and k2 into columns 
      .add_suffix('_cond') # add suffix to column names: ki --> ki_cond
      .rename_axis(columns=None) # remove the column axis label (K)
      .reset_index() # make id a column, not the index       
)

Output:

>>> out

   id  k1_cond  k2_cond
0   1     True    False
1   2     True     True

You can easily add more conditions to the cond_dict if the column K contains other values besides k1 and k2 .

Dataframe.apply should work:

df["k1_condition"] = df.apply(lambda x: x["K"]=="k1" & x["V"]>1, axis=1)
df["k2_condition"] = df.apply(lambda x: x["K"]=="k2" & x["V"]>4, axis=1)
df2 = df[["id", "k1_condition", "k2_condition"]].groupy("id").any()

here is one way to do it

df2=df.pivot_table(index='id',columns='K', values='V', aggfunc=['min','max']).reset_index()
df2.columns = ['_'.join(col) for col in df2.columns ]
df2['k1_condition'] = df2['min_k1'] > 1
df2['k2_condition'] = df2['max_k2'] <4
df2=df2.drop(columns=['min_k1','min_k2','max_k1','max_k2'])
df2

OR

df2=df.pivot_table(index='id',columns='K', values='V', aggfunc=['min','max']).reset_index()
df2['k1_condition'] = df2['min']['k1'] > 1
df2['k2_condition'] = df2['max']['k2'] <4
df2.drop(columns=['min','max'],level=0,inplace=True)
df2

    id_     k1_condition    k2_condition
0   1       True            False
1   2       True            True

You could use pivot_table with a conditions function.

def conditions(x):
    k = df.at[x.index[0],'K']

    if k == 'k1':
        return (x>1).all()
    
    return (x<4).all()

pd.pivot_table(df, index='id', columns='K', aggfunc=conditions) \
    .droplevel(level=0, axis=1).add_suffix('_condition') \
    .rename_axis(None, axis=1).reset_index()  

Result

   id   k1_conditon  k2_condition
0   1          True         False
1   2          True          True

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM