简体   繁体   中英

Setting calculated value for column for each group in a dataframe

I have dataframe where I need to group by column x and change all the values of column a in every group to a calculated, but constant value for each group.

I start with a dataframe like this:

x     |   a  |   b 
------+------+-----   
a     |  -1  |  ...
b     |  -1  |  ...
c     |  -1  |  ...
a     |  -1  |  ...
b     |  -1  |  ...
c     |  -1  |  ...

and want to transform it to the dataframe below by grouping by column x and changing column a to the return of function f

p = ["k", "l"]

def f(group_number, list):    
    return list[group_number % len(list)]

x     |   a               |   b 
------+-------------------+-----   
a     |  f(ngroup(a), p)  |  ...
b     |  f(ngroup(b), p)  |  ...
c     |  f(ngroup(c), p)  |  ...
a     |  f(ngroup(a), p)  |  ...
b     |  f(ngroup(b), p)  |  ...
c     |  f(ngroup(c), p)  |  ...

ngroup is some function that does exactly what pandas.core.groupby.GroupBy.ngroup() does- it returns a number for every group.

The overall result should be

x     |  a  |   b 
------+-----+-----   
a     |  k  |  ...
b     |  l  |  ...
c     |  k  |  ...
a     |  k  |  ...
b     |  l  |  ...
c     |  k  |  ...

where all entries with a have the same value ( k ), all with b have value l and all with c have value k , too.

How can I achieve this?

What you want to do is

df['a'] = p[df.groupby('x').ngroup() % len(p)]  # TypeError here

Unfortunately, you cannot directly broadcast to a Python list so this will raise a

TypeError: list indices must be integers or slices, not Series

But numpy ndarrays allow it, so you can just do:

df['a'] = np.array(p)[df.groupby('x').ngroup() % len(p)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM