Add new column to dataframe based on an average

Question

I have a dataframe that includes the category of a project, currency, number of investors, goal, etc., and I want to create a new column which will be "average success rate of their category":

   state        category main_category currency  backers country  \

0      0          Poetry    Publishing      GBP        0      GB
1      0  Narrative Film  Film & Video      USD       15      US
2      0  Narrative Film  Film & Video      USD        3      US
3      0           Music         Music      USD        1      US
4      1     Restaurants          Food      USD      224      US

   usd_goal_real  duration  year       hour
0        1533.95        59  2015    morning
1       30000.00        60  2017    morning
2       45000.00        45  2013    morning
3        5000.00        30  2012    morning
4       50000.00        35  2016  afternoon

I have the average success rates in series format:

Dance           65.435209

Theater         63.796134

Comics          59.141527

Music           52.660558

Art             44.889045

Games           43.890467

Film & Video    41.790649

Design          41.594386

Publishing      34.701650

Photography     34.110847

Fashion         28.283186

Technology      23.785582

And now I want to add in a new column, where each column will have a success rate matching their category, ie wherever the row is technology, the new column will include 23.78 for that row.

df[category_success_rate] = i want the output column to be the % success which matches with the category in "main category" column.

Answer 1

I think you need GroupBy.transform with a Boolean mask, df['state'].eq(1) or (df['state'] == 1) :

df['category_success_rate'] = (df['state'].eq(1)
                                 .groupby(df['main_category']).transform('mean') * 100)

Alternative:

df['category_success_rate'] = ((df['state'] == 1)
                                 .groupby(df['main_category']).transform('mean') * 100)

Add new column to dataframe based on an average

Question

1 answers

solution1
0 2018-12-12 08:48:29

Add new column to dataframe based on an average

Question

1 answers

solution1 0 2018-12-12 08:48:29

solution1
0 2018-12-12 08:48:29