简体   繁体   English

如何比较两个 DataFrame 并返回包含列匹配的值的矩阵

[英]How to compare two DataFrames and return a matrix containing values where columns matched

I have two data frames as follows:我有两个数据框如下:

df1

   id  start  end  label    item
0   1      0    3   food  burger
1   1      4    6  drink    cola
2   2      0    3   food   fries

df2 

   id  start  end  label    item
0   1      0    3   food  burger
1   1      4    6   food    cola
2   2      0    3  drink   fries

I would like to compare the two data frames (by checking where they match in the id, start, end columns) and create a matrix of size 2 x (number of items) for each id.我想比较两个数据框(通过检查它们在 id、start、end 列中的匹配位置)并为每个 id 创建一个大小为 2 x(项目数)的矩阵。 The cells should contain the label corresponding to an item.单元格应包含对应于项目的 label。 In this example:在这个例子中:

M_id1: [[food, drink],      M_id2: [[food], 
        [food, food]]               [drink]]

I tried looking at the pandas documentation but didn't really find anything that could help me.我尝试查看 pandas 文档,但并没有真正找到任何可以帮助我的东西。

You can merge the dataframe df1 and df2 on columns id, start, end then group the merged dataframe on id and for each group per id create key-value pairs inside dict comprehension where key is the id and value is the corresponding matrix of labels:您可以在id id, start, end列上merge dataframe df1df2 ,然后在id上对合并的 dataframe 进行group ,并为每个组在dict理解中创建键值对,其中键是id ,值是相应的标签矩阵:

m = df1.merge(df2, on=['id', 'start', 'end'])
dct = {f'M_id{k}': g.filter(like='label').to_numpy().T for k, g in m.groupby('id')}

To access the matrix of labels use dictionary lookup:要访问标签矩阵,请使用字典查找:

>>> dct['M_id1']
array([['food', 'drink'], ['food', 'food']], dtype=object)

>>> dct['M_id2']
array([['food'], ['drink']], dtype=object)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM