count frequency based in two columns without group by

Question

I have a dataset with 3 columns: Category, Country, and Count (which is always 1 - and is pretty useless, actually).

What I want to achieve is something like the yellow column here:

img 1：我想要什么，我想要什么

I could do a simple group by in python, but that's not what I want, because I want to preserve the individual rows of the data, different from the image below (that groups them):

我做了什么，我不想要什么（分组）

I just wanted to get the frequency based on both columns, without grouping it, any idea? I thought about iterating with for loops, but I couldn't, I'm kind of a beginner in python, so your help is deeply appreciated.

Answer 1

It seems like you want to use transform here. That will create a new column in your dataframe with the grouped summary statistics you are looking for.

import pandas as pd
df = pd.DataFrame({'category_cluster' : ['Assault', 'Assault', 'Assault', 'Assault', 'Assault', 'Assault', 'Assault'],
                   'Country': ['Egypt', 'India', 'India', 'Mexico', 'Mexico', 'Mexico', 'Morocco'],
                   'Count' : [1, 1, 1, 1, 1, 1, 1]})

df['new_column'] = df.groupby(['category_cluster', 'Country'])['Count'].transform('sum')

count frequency based in two columns without group by

Question

1 answers

solution1
-1 ACCPTED 2019-11-20 16:07:08

count frequency based in two columns without group by

Question

1 answers

solution1 -1 ACCPTED 2019-11-20 16:07:08

solution1
-1 ACCPTED 2019-11-20 16:07:08