简体   繁体   English

Python Pandas 计算两列的 value_counts 并使用 groupby

[英]Python Pandas calculate value_counts of two columns and use groupby

I have a dataframe :我有一个数据框:

data = {'label': ['cat','dog','dog','cat','cat'],
      'breeds': [ 'bengal','shar pei','pug','maine coon','maine coon'],
      'nicknames':[['Loki','Loki' ],['Max'],['Toby','Zeus ','Toby'],['Marty'],['Erin ','Erin']],
       'eye color':[['blue','green'],['green'],['brown','brown','brown'],['blue'],['green','brown']]
                   

Output:输出:

label    breeds    nicknames            eye color
0   cat  bengal     [Loki,Loki]      [blue, green]
1   dog  shar pei   [Max]            [green]
2   dog  pug        [Toby,Zeus,Toby] [brown, brown, brown]
3   cat  maine coon [Marty]          [blue]
4   cat  maine coon [Erin,Erin]      [green, brown]

I want to apply the groupby :frame['label', 'breeds'], and calculate value_counts (unique value ) of nicknames and eye color, but output them in different columns : 'nickname_count','eye_count' This code outputs only in one column, how do I output separately?我想应用 groupby :frame['label', 'breeds'],并计算昵称和眼睛颜色的value_counts (唯一值),但在不同的列中输出它们:'nickname_count','eye_count' 此代码仅在一栏,如何单独输出?

 frame2=frame.groupby(['label','breeds'])['nicknames','eye color'].apply(lambda x: x.astype('str').value_counts().to_dict())

First, we use a groupby with sum on the lists as sum concatenates the lists together :首先,我们在列表上使用带有sumgroupby ,因为sum将列表连接在一起:

>>> df_grouped = df.groupby(['label', 'breeds']).agg({'nicknames': sum, 'eye color': sum}).reset_index()
>>> df_grouped
    label   breeds      nicknames               eye color
0   cat     bengal      [Loki, Loki]            [blue, green]
1   cat     maine coon  [Marty, Erin , Erin]    [blue, green, brown]
2   dog     pug         [Toby, Zeus , Toby]     [brown, brown, brown]
3   dog     shar pei    [Max]                   [green]

Then, we can count the number of unique values in list by converting it to set, using len and save the output in two new columns to get the expected result :然后,我们可以通过将列表转换为 set 来计算列表中唯一值的数量,使用len并将输出保存在两个新列中以获得预期结果:

>>> df_grouped['nickname_count'] = df_grouped['nicknames'].apply(lambda x: list(set(x))).str.len()
>>> df_grouped['eye_count'] = df_grouped['eye color'].apply(lambda x: list(set(x))).str.len()
>>> df_grouped
    label   breeds      nicknames               eye color               nickname_count  eye_count
0   cat     bengal      [Loki, Loki]            [blue, green]           1               2
1   cat     maine coon  [Marty, Erin , Erin]    [blue, green, brown]    3               3
2   dog     pug         [Toby, Zeus , Toby]     [brown, brown, brown]   2               1
3   dog     shar pei    [Max]                   [green]                 1               1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM