I've been trying to use collection.Counter
or value_counts
in Python 3.7 to do something like the df below, but I had no success. So far, this is an example of what I'm trying to get:
IDs Col2 Col3
0 123 [A, A, B, B, C] {A:2, B:2, C:1}
1 456 [A, B, C, C] {A:1, B:1, C:2}
2 789 [A, A, A, D, D] {A:3, D:2}
Then I need to get for each correspondent row, the maximum value in Col3
and, if there's a tie, show it in a new column only with the keys that tied. Something like this:
IDs Col2 Col3 Max
0 123 [A, A, B, B, C] {A:2, B:2, C:1} {A:2, B:2}
1 456 [A, B, C, C] {A:1, B:1, C:2} {C:2}
2 789 [A, A, A, D, D] {A:3, D:2} {A:3}
Use dict comprehension with test if value is max
:
from collections import Counter
df = pd.DataFrame({'Col1':[123,456,789],
'Col2':[list('AABBC'), list('ABCC'), list('AAADD')]})
df['Col3'] = df['Col2'].apply(Counter)
df['Max'] = df['Col3'].apply(lambda x: {k:v for k, v in x.items() if max(x.values()) == v})
Thank you @Keyur Potdar for another idea use most_common
:
f = lambda x: {k:v for k, v in x.items() if x.most_common(1)[0][1] == v}
df['Max'] = df['Col3'].apply(f)
print (df)
Col1 Col2 Col3 Max
0 123 [A, A, B, B, C] {'A': 2, 'B': 2, 'C': 1} {'A': 2, 'B': 2}
1 456 [A, B, C, C] {'A': 1, 'B': 1, 'C': 2} {'C': 2}
2 789 [A, A, A, D, D] {'A': 3, 'D': 2} {'A': 3}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.