[英]How to compare two DataFrames and return a matrix containing values where columns matched
I have two data frames as follows:我有两个数据框如下:
df1
id start end label item
0 1 0 3 food burger
1 1 4 6 drink cola
2 2 0 3 food fries
df2
id start end label item
0 1 0 3 food burger
1 1 4 6 food cola
2 2 0 3 drink fries
I would like to compare the two data frames (by checking where they match in the id, start, end columns) and create a matrix of size 2 x (number of items) for each id.我想比较两个数据框(通过检查它们在 id、start、end 列中的匹配位置)并为每个 id 创建一个大小为 2 x(项目数)的矩阵。 The cells should contain the label corresponding to an item.
单元格应包含对应于项目的 label。 In this example:
在这个例子中:
M_id1: [[food, drink], M_id2: [[food],
[food, food]] [drink]]
I tried looking at the pandas documentation but didn't really find anything that could help me.我尝试查看 pandas 文档,但并没有真正找到任何可以帮助我的东西。
You can merge
the dataframe df1
and df2
on columns id, start, end
then group
the merged dataframe on id
and for each group per id
create key-value pairs inside dict
comprehension where key is the id
and value is the corresponding matrix of labels:您可以在
id
id, start, end
列上merge
dataframe df1
和df2
,然后在id
上对合并的 dataframe 进行group
,并为每个组在dict
理解中创建键值对,其中键是id
,值是相应的标签矩阵:
m = df1.merge(df2, on=['id', 'start', 'end'])
dct = {f'M_id{k}': g.filter(like='label').to_numpy().T for k, g in m.groupby('id')}
To access the matrix of labels use dictionary lookup:要访问标签矩阵,请使用字典查找:
>>> dct['M_id1']
array([['food', 'drink'], ['food', 'food']], dtype=object)
>>> dct['M_id2']
array([['food'], ['drink']], dtype=object)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.