简体   繁体   中英

Pandas groupby into unique sets and then value_counts

Was just wondering if there was a better way to do this. Basically I have some categories I want to find all unique combos for each val, and then count the number of instances for each category. The inclusion of the astype(str) irks me.

df = pd.DataFrame(
    {
        'cat': ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b'],
        'val': [1, 1, 2, 2, 3, 4, 5, 5]
    }
)

df.groupby('val')['cat'].apply(lambda x: set(x)).astype(str).value_counts() 

Out:

{'a', 'b'}    2
{'c', 'a'}    1
{'b'}         1
{'c'}         1
Name: cat, dtype: int64 

The following does not give the desired result

df.groupby('val')['cat'].unique().value_counts()

Out:

[b]       1
[c, a]    1
[a, b]    1
[c]       1
[a, b]    1 

You can use GroupBy.agg into tuple orfrozenset since they are hashable, then use Series.value_counts

df.groupby('val').agg(tuple).value_counts()
#               _.agg(frozenset).value_counts() works fine too.

cat   
(a, b)    2
(a, c)    1
(b)       1
(c)       1
dtype: int64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM