I have a dataset with 3 columns: Category, Country, and Count (which is always 1 - and is pretty useless, actually).
What I want to achieve is something like the yellow column here:
I could do a simple group by in python, but that's not what I want, because I want to preserve the individual rows of the data, different from the image below (that groups them):
I just wanted to get the frequency based on both columns, without grouping it, any idea? I thought about iterating with for loops, but I couldn't, I'm kind of a beginner in python, so your help is deeply appreciated.
It seems like you want to use transform
here. That will create a new column in your dataframe with the grouped summary statistics you are looking for.
import pandas as pd
df = pd.DataFrame({'category_cluster' : ['Assault', 'Assault', 'Assault', 'Assault', 'Assault', 'Assault', 'Assault'],
'Country': ['Egypt', 'India', 'India', 'Mexico', 'Mexico', 'Mexico', 'Morocco'],
'Count' : [1, 1, 1, 1, 1, 1, 1]})
df['new_column'] = df.groupby(['category_cluster', 'Country'])['Count'].transform('sum')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.