简体   繁体   中英

Count occurrences in a list for each row and specific column in a dataframe

I've been trying to use collection.Counter or value_counts in Python 3.7 to do something like the df below, but I had no success. So far, this is an example of what I'm trying to get:

    IDs        Col2               Col3
0   123   [A, A, B, B, C]    {A:2, B:2, C:1}
1   456   [A, B, C, C]       {A:1, B:1, C:2}
2   789   [A, A, A, D, D]    {A:3, D:2}

Then I need to get for each correspondent row, the maximum value in Col3 and, if there's a tie, show it in a new column only with the keys that tied. Something like this:

    IDs        Col2               Col3            Max
0   123   [A, A, B, B, C]    {A:2, B:2, C:1}   {A:2, B:2}
1   456   [A, B, C, C]       {A:1, B:1, C:2}   {C:2}
2   789   [A, A, A, D, D]    {A:3, D:2}        {A:3}

Use dict comprehension with test if value is max :

from collections import Counter

df = pd.DataFrame({'Col1':[123,456,789], 
                   'Col2':[list('AABBC'), list('ABCC'), list('AAADD')]})

df['Col3'] = df['Col2'].apply(Counter)
df['Max'] = df['Col3'].apply(lambda x: {k:v for k, v in x.items() if max(x.values()) == v})

Thank you @Keyur Potdar for another idea use most_common :

f = lambda x: {k:v for k, v in x.items() if x.most_common(1)[0][1] == v}
df['Max'] = df['Col3'].apply(f)

print (df)
   Col1             Col2                      Col3               Max
0   123  [A, A, B, B, C]  {'A': 2, 'B': 2, 'C': 1}  {'A': 2, 'B': 2}
1   456     [A, B, C, C]  {'A': 1, 'B': 1, 'C': 2}          {'C': 2}
2   789  [A, A, A, D, D]          {'A': 3, 'D': 2}          {'A': 3}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM