简体   繁体   中英

python pandas custom agg function

Dataframe:
  one two
a  1  x
b  1  y
c  2  y
d  2  z
e  3  z

grp = DataFrame.groupby('one')
grp.agg(lambda x: ???) #or equivalent function

Desired output from grp.agg:

one two
1   x|y
2   y|z
3   z

My agg function before integrating dataframes was "|".join(sorted(set(x))) . Ideally I want to have any number of columns in the group and agg returns the "|".join(sorted(set()) for each column item like two above. I also tried np.char.join() .

Love Pandas and it has taken me from a 800 line complicated program to a 400 line walk in the park that zooms. Thank you :)

You were so close:

In [1]: df.groupby('one').agg(lambda x: "|".join(x.tolist()))
Out[1]:
     two
one
1    x|y
2    y|z
3      z

Expanded answer to handle sorting and take only the set:

In [1]: df = DataFrame({'one':[1,1,2,2,3], 'two':list('xyyzz'), 'three':list('eecba')}, index=list('abcde'), columns=['one','two','three'])

In [2]: df
Out[2]:
   one two three
a    1   x     e
b    1   y     e
c    2   y     c
d    2   z     b
e    3   z     a

In [3]: df.groupby('one').agg(lambda x: "|".join(x.order().unique().tolist()))
Out[3]:
     two three
one
1    x|y     e
2    y|z   b|c
3      z     a

There is a better way to concatenate strings, in pandas documentation .
So I prefer this way:

In [1]: df.groupby('one').agg(lambda x: x.str.cat(sep='|'))
Out[1]:
     two
one
1    x|y
2    y|z
3      z

Just an elaboration on the accepted answer:

df.groupby('one').agg(lambda x: "|".join(x.tolist()))

Note that the type of df.groupby('one') is SeriesGroupBy . And the function agg defined on this type. If you check the documentation of this function, it says its input is a function that works on Series. This means that x type in the above lambda is Series.

Another note is that defining the agg function as lambda is not necessary. If the aggregation function is complex, it can be defined separately as a regular function like below. The only constraint is that the x type should be of Series (or compatible with it):

def myfun1(x):
    return "|".join(x.tolist())

and then:

df.groupby('one').agg(myfun1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM