Python：您能检查两个列值的唯一组合在另一个数据框中出现多少次吗？

Question

I am trying to see how many times a unique combination of two column values appears in another dataframe and add it as a new column with one line.我试图查看两个列值的唯一组合出现在另一个数据框中的次数，并将其添加为一行的新列。 I have a reference table looking at unique combinations of the ID and Desc fields.我有一个参考表，查看ID和Desc字段的唯一组合。 I also have a table that has all active occurrences of those combinations我还有一个表，其中包含这些组合的所有活跃事件

     ref_table                               active_data
   ID      Desc                         ID         Desc
0   1     Windows                    0   1        Windows
1   1     Linux                      1   1        Windows
2   2     Linux                      2   1        Linux
3   3     Network                    3   2        Linux
4   4     Automation                 4   3        Network
                                     5   3        Network
                                     6   3        Network
                                     7   4        Automation

I'd like to add to the ref_table the count of the unique combinations of the ID and Desc fields that appears in active_data like so:我想将出现在active_data中的ID和Desc字段的唯一组合的计数添加到ref_table中， active_data所示：

         ref_table                              
   ID      Desc        Count                  
0   1     Windows        2   
1   1     Linux          1              
2   2     Linux          1            
3   3     Network        3          
4   4     Automation     1

I recognize this can be accomplished by performing pd.merge or join .我认识到这可以通过执行pd.merge或join来完成。 However, if possible, I would like to do it with one line, and if I was just concerned with the count of one column like ID , I know it can be done with:但是，如果可能的话，我想用一行来完成，如果我只关心像ID这样的一列的计数，我知道可以通过以下方式完成：

ref_table['Count'] = ref_table['ID'].map(active_data['ID'].value_counts()) . ref_table['Count'] = ref_table['ID'].map(active_data['ID'].value_counts()) 。

Trying to extend this to look at both the ID AND Desc columns using:尝试使用以下方法扩展它以查看ID和Desc列：

ref_table['Count'] = ref_table[['ID', 'Desc']].apply(active_data[['ID', 'Desc']].value_counts()) produces an error, KeyError: "None of [Index([3, 'Network'], dtype='object')] are in the [index]" . ref_table['Count'] = ref_table[['ID', 'Desc']].apply(active_data[['ID', 'Desc']].value_counts())产生错误， KeyError: "None of [Index([3, 'Network'], dtype='object')] are in the [index]" 。 Ideally I would like to use the value_counts solution, but cannot figure it out with two columns.理想情况下，我想使用 value_counts 解决方案，但无法用两列计算出来。

Answer 1

You can do a merge on groupby :您可以对groupby进行merge ：

ref_table.merge(active_data.groupby(['ID','Desc'], as_index=False)['ID'].count(),
                on=['ID','Desc'], how='left')

Or you can merge , then groupby :或者你可以merge ，然后groupby ：

(ref_table.merge(active_data, on=['ID','Desc'], how='left')
     .groupby(['ID','Desc'])['ID'].count()
     .reset_index('Count')
)

Python：您能检查两个列值的唯一组合在另一个数据框中出现多少次吗？

问题描述

1 个解决方案

解决方案1
2 2020-11-23 16:13:35

Python：您能检查两个列值的唯一组合在另一个数据框中出现多少次吗？

问题描述

1 个解决方案

解决方案1 2 2020-11-23 16:13:35

解决方案1
2 2020-11-23 16:13:35