I'm want to return the values in a value_counts of col2 back to the original dataframe after a pandas groupby based on col1.
ie I have...
col1 col2
0 1111 A
1 1111 B
2 1111 B
3 1111 B
4 1111 C
5 2222 A
6 2222 B
7 2222 C
8 2222 C
and I'd like...
col1 col2 col3
0 1111 A 1
1 1111 B 3
2 1111 B 3
3 1111 B 3
4 1111 C 1
5 2222 A 1
6 2222 B 1
7 2222 C 2
8 2222 C 2
I can get the values of col3 using a groupby and then passing the col2 value into value_counts, but I'm not sure how to then get this back into the dataframe.
Example:
d1 = {'col1': ['1111', '1111', '1111', '1111', '1111', '2222', '2222', '2222', '2222'],
'col2': ['A', 'B', 'B', 'B', 'C', 'A', 'B', 'C', 'C']}
df1 = pd.DataFrame(data=d1)
d2 = {'col1': ['1111', '1111', '1111', '1111', '1111', '2222', '2222', '2222', '2222'],
'col2': ['A', 'B', 'B', 'B', 'C', 'A', 'B', 'C', 'C'],
'col3': [1, 3, 3, 3, 1, 1, 1, 2, 2]}
df2 = pd.DataFrame(data=d2)
print(df1)
print(df2)
counts = df1.groupby('col1').apply(lambda x: x.col2.value_counts()[x.col2])
print(counts)
you can make this with groupby
and transform
.
df['col3'] = df1.groupby(['col1','col2'])['col2'].transform('count')
print(df)
col1 col2 col3
0 1111 A 1
1 1111 B 3
2 1111 B 3
3 1111 B 3
4 1111 C 1
5 2222 A 1
6 2222 B 1
7 2222 C 2
8 2222 C 2
I'm not sure if this is optimal, but here's my go at it. Reading @Terry's comment using .transform('count')
made me feel like counting using fingers:
import pandas as pd
d1 = {'col1': ['1111', '1111', '1111', '1111', '1111', '2222', '2222', '2222', '2222'],
'col2': ['A', 'B', 'B', 'B', 'C', 'A', 'B', 'C', 'C']}
df1 = pd.DataFrame(data=d1)
df_aux = df1.groupby(['col1','col2'])['col1'].count().rename(columns={0:'col3'})
df_aux = df_aux.reset_index()
df_output = df1.merge(df_aux.rename(columns={df_aux.columns[2]:'col3'}),how='left',on=['col1','col2'])
print(df_output)
Output:
col1 col2 col3
0 1111 A 1
1 1111 B 3
2 1111 B 3
3 1111 B 3
4 1111 C 1
5 2222 A 1
6 2222 B 1
7 2222 C 2
8 2222 C 2
Hi following is another approach:
just execute in your notebook:
import pandas as pd
dictionary1={ 'col1':[1111,1111,1111,1111,1111,2222,2222,2222,2222],
'col2':['A','B','B','B','C','A','B','C','C']
}
df1=pd.DataFrame(dictionary1)
d=df1.groupby(['col1','col2'])['col2'].count().rename(columns={'col3'})
pd.DataFrame(d,columns=['col3'])
Output
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.