简体   繁体   中英

Pandas DataFrame: How to calculate a new column with Price divided by number of lines of a group category?

My dataframe have Name, Group, Price columns and data.

The price data is the total price paid by the group. I need for each line to calculate the mean price per person of the line.

Eg for the A Group members I need to divide the paid price 12 by the number of people in this group: 3.

So ppp (price per person) should be 12/3 = 4

df = pd.DataFrame(
    data = [('Bob', 'A', 12),
            ('Jessica', 'A',12),
            ('Kevin', 'A',12),
            ('Mary', 'B',5),
            ('John', 'C',14),
            ('Mel', 'C',14)
            ],
    columns=['Names', 'Group', 'Price']
)

I tried this:

a=df.groupby('Group')['Price'].max()
b=df.groupby('Group')['Price'].count()
df.groupby('Group')['Price'].max() / df.groupby('Group')['Price'].count()
ppp = a/b

df['ppp']=0

for a in df.itertuples():
    print(a)
    print(a.Group)
    a.ppp = ppp[a.Group]

But I have an error: AttributeError: can't set attribute

expected result is:

df = pd.DataFrame(
    data = [('Bob', 'A', 12, 4),
            ('Jessica', 'A',12, 4),
            ('Kevin', 'A',12, 4),
            ('Mary', 'B',5, 5),
            ('John', 'C',14, 7),
            ('Mel', 'C',14, 7)
            ],
    columns=['Names', 'Group', 'Price', 'ppp']
)

Could you tell me what's wrong, and also how to do this without iterating if possible?

Try with transform

a = df.groupby('Group')['Price'].transform('max')
b = df.groupby('Group')['Price'].transform('count')
df['ppp'] = a/b

One other way, is to create a mapping and reassign back to the original dataframe.

Note that transform is more idiomatic though:

mapping = df.groupby("Group").Price.pipe(lambda x: x.max() / x.count())
mapping

Group
A    4.0
B    5.0
C    7.0
Name: Price, dtype: float64


df.assign(ppp=df.Group.map(mapping))

    Names   Group   Price   ppp
0   Bob         A   12  4.0
1   Jessica     A   12  4.0
2   Kevin       A   12  4.0
3   Mary        B   5   5.0
4   John        C   14  7.0
5   Mel         C   14  7.0

This is an ugly solution that will work fine;

df['ppp']=df.apply(lambda row: df.loc[df['Group']==row.Group,'Price'].max()/df.loc[df['Group']==row.Group,'Price'].count(),axis=1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM