简体   繁体   中英

count frequency based in two columns without group by

I have a dataset with 3 columns: Category, Country, and Count (which is always 1 - and is pretty useless, actually).

What I want to achieve is something like the yellow column here:

img 1:我想要什么,我想要什么

I could do a simple group by in python, but that's not what I want, because I want to preserve the individual rows of the data, different from the image below (that groups them):

我做了什么,我不想要什么(分组)

I just wanted to get the frequency based on both columns, without grouping it, any idea? I thought about iterating with for loops, but I couldn't, I'm kind of a beginner in python, so your help is deeply appreciated.

It seems like you want to use transform here. That will create a new column in your dataframe with the grouped summary statistics you are looking for.

import pandas as pd
df = pd.DataFrame({'category_cluster' : ['Assault', 'Assault', 'Assault', 'Assault', 'Assault', 'Assault', 'Assault'],
                   'Country': ['Egypt', 'India', 'India', 'Mexico', 'Mexico', 'Mexico', 'Morocco'],
                   'Count' : [1, 1, 1, 1, 1, 1, 1]})

df['new_column'] = df.groupby(['category_cluster', 'Country'])['Count'].transform('sum')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM