简体   繁体   English

如何使用python-pandas合并基于互信息的两个数据框?

[英]How to merge two data frames based on mutual information with python-pandas?

Given two data frame, df1 and df2 , containing information of item_id-rating and item_id-class : 给定两个数据帧df1df2 ,其中包含item_id-ratingitem_id-class

df1:

B0006IYIMW 5.0
B000A56PUO 3.0
B000AMLQQU 4.0
B000OVNMGE 1.0

df2:

B0006IYIMW iphone
B000OVNMGE samsung
B000AMLQQU htc
B000A56PUO nokia

I wish to merge df1 and df to get the full info of item_id-class-rating , so the resulting data frame should be: 我希望将df1df合并以获取item_id-class-rating的完整信息,因此生成的数据帧应为:

B0006IYIMW iphone 5.0
B000OVNMGE samsung 1.0
B000AMLQQU htc 4.0
B000A56PUO nokia 3.0

Please notice that the order of two data frames maybe different. 请注意,两个数据帧的顺序可能不同。

Could you please tell me how to do it? 你能告诉我怎么做吗? Thanks in advance! 提前致谢!

Try this: 尝试这个:

import pandas as pd

df1 = pd.DataFrame([['B0006IYIMW',5.0],['B000A56PUO', 3.0],['B000AMLQQU', 4.0],['B000OVNMGE', 1.0]],columns=('item_id','rating'))
df2 = pd.DataFrame([['B0006IYIMW','iphone'],['B000A56PUO', 'nokia'],['B000AMLQQU', 'htc'],['B000OVNMGE', 'samsung']],columns=('item_id','class'))

df_merged = df1.merge(df2,on='item_id')

print df_merged

As usually, when I can't find the solution I start to hack my own, and by the time I've achieved many bad results and finally have the right one, somebody else already posted one-line solution :) Here it is anyway 像往常一样,当我找不到解决方案时,我便开始破解自己的解决方案,等到我取得许多不良结果并最终获得正确的解决方案时,其他人已经发布one-line解决方案:)无论如何

import pandas as pd
# the frames are named the same way, and rows are in the same order
# assuming item-ids are unique I've created list of indices
# which corresponds to the index of the elements from df1 in df2
df2_index = [df2['item-id'].tolist().index(df1['item-id'][x]) for x in range(len(df1))]
# now reindex df1 according to the list and reset index!
df1 = df1.reindex(df2_index).reset_index(drop=True)
# now you can simply add the missing column
df2['item-rating'] = df1['item-rating']

Setup 设定

import pandas as pd

idx = pd.Index(['B0006IYIMW', 'B000A56PUO', 'B000AMLQQU', 'B000OVNMGE'],
               name='item-id')
df1 = pd.DataFrame([5., 3., 4., 1.],
                   columns=['rating'], index=idx)
df2 = pd.DataFrame(['iphone', 'samsung', 'htc', 'nokia'],
                   columns=['class'], index=idx)

Solution

df = pd.concat([df2, df1], axis=1)

Demonstration 示范

print df 

              class  rating
item-id                    
B0006IYIMW   iphone     5.0
B000A56PUO  samsung     3.0
B000AMLQQU      htc     4.0
B000OVNMGE    nokia     1.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM