[英]Python: Can you check how many times a unique combination of two column values appears in another dataframe?
I am trying to see how many times a unique combination of two column values appears in another dataframe and add it as a new column with one line.我试图查看两个列值的唯一组合出现在另一个数据框中的次数,并将其添加为一行的新列。 I have a reference table looking at unique combinations of the ID
and Desc
fields.我有一个参考表,查看ID
和Desc
字段的唯一组合。 I also have a table that has all active occurrences of those combinations我还有一个表,其中包含这些组合的所有活跃事件
ref_table active_data
ID Desc ID Desc
0 1 Windows 0 1 Windows
1 1 Linux 1 1 Windows
2 2 Linux 2 1 Linux
3 3 Network 3 2 Linux
4 4 Automation 4 3 Network
5 3 Network
6 3 Network
7 4 Automation
I'd like to add to the ref_table
the count of the unique combinations of the ID
and Desc
fields that appears in active_data
like so:我想将出现在active_data
中的ID
和Desc
字段的唯一组合的计数添加到ref_table
中, active_data
所示:
ref_table
ID Desc Count
0 1 Windows 2
1 1 Linux 1
2 2 Linux 1
3 3 Network 3
4 4 Automation 1
I recognize this can be accomplished by performing pd.merge
or join
.我认识到这可以通过执行pd.merge
或join
来完成。 However, if possible, I would like to do it with one line, and if I was just concerned with the count of one column like ID
, I know it can be done with:但是,如果可能的话,我想用一行来完成,如果我只关心像ID
这样的一列的计数,我知道可以通过以下方式完成:
ref_table['Count'] = ref_table['ID'].map(active_data['ID'].value_counts())
. ref_table['Count'] = ref_table['ID'].map(active_data['ID'].value_counts())
。
Trying to extend this to look at both the ID
AND Desc
columns using:尝试使用以下方法扩展它以查看ID
和Desc
列:
ref_table['Count'] = ref_table[['ID', 'Desc']].apply(active_data[['ID', 'Desc']].value_counts())
produces an error, KeyError: "None of [Index([3, 'Network'], dtype='object')] are in the [index]"
. ref_table['Count'] = ref_table[['ID', 'Desc']].apply(active_data[['ID', 'Desc']].value_counts())
产生错误, KeyError: "None of [Index([3, 'Network'], dtype='object')] are in the [index]"
。 Ideally I would like to use the value_counts solution, but cannot figure it out with two columns.理想情况下,我想使用 value_counts 解决方案,但无法用两列计算出来。
You can do a merge
on groupby
:您可以对groupby
进行merge
:
ref_table.merge(active_data.groupby(['ID','Desc'], as_index=False)['ID'].count(),
on=['ID','Desc'], how='left')
Or you can merge
, then groupby
:或者你可以merge
,然后groupby
:
(ref_table.merge(active_data, on=['ID','Desc'], how='left')
.groupby(['ID','Desc'])['ID'].count()
.reset_index('Count')
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.