如何比较两个 DataFrame 并返回包含列匹配的值的矩阵

Question

I have two data frames as follows:我有两个数据框如下：

df1

   id  start  end  label    item
0   1      0    3   food  burger
1   1      4    6  drink    cola
2   2      0    3   food   fries

df2 

   id  start  end  label    item
0   1      0    3   food  burger
1   1      4    6   food    cola
2   2      0    3  drink   fries

I would like to compare the two data frames (by checking where they match in the id, start, end columns) and create a matrix of size 2 x (number of items) for each id.我想比较两个数据框（通过检查它们在 id、start、end 列中的匹配位置）并为每个 id 创建一个大小为 2 x（项目数）的矩阵。 The cells should contain the label corresponding to an item.单元格应包含对应于项目的 label。 In this example:在这个例子中：

M_id1: [[food, drink],      M_id2: [[food], 
        [food, food]]               [drink]]

I tried looking at the pandas documentation but didn't really find anything that could help me.我尝试查看 pandas 文档，但并没有真正找到任何可以帮助我的东西。

Answer 1

You can merge the dataframe df1 and df2 on columns id, start, end then group the merged dataframe on id and for each group per id create key-value pairs inside dict comprehension where key is the id and value is the corresponding matrix of labels:您可以在id id, start, end列上merge dataframe df1和df2 ，然后在id上对合并的 dataframe 进行group ，并为每个组在dict理解中创建键值对，其中键是id ，值是相应的标签矩阵：

m = df1.merge(df2, on=['id', 'start', 'end'])
dct = {f'M_id{k}': g.filter(like='label').to_numpy().T for k, g in m.groupby('id')}

To access the matrix of labels use dictionary lookup:要访问标签矩阵，请使用字典查找：

>>> dct['M_id1']
array([['food', 'drink'], ['food', 'food']], dtype=object)

>>> dct['M_id2']
array([['food'], ['drink']], dtype=object)

如何比较两个 DataFrame 并返回包含列匹配的值的矩阵

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-02-05 14:58:58

如何比较两个 DataFrame 并返回包含列匹配的值的矩阵

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-02-05 14:58:58

解决方案1
2 已采纳 2021-02-05 14:58:58