[英]Join with MultiIndex DataFrames Creates Empty Fields [pandas]
I'm trying to join two DataFrames in pandas on two fields, 'date_key' and 'user_uuid', but when I do I only get an empty set, despite there being overlap when I search the tables for matches. 我正在尝试在两个字段(“ date_key”和“ user_uuid”)上以大熊猫的形式联接两个DataFrame,但是当我这样做时,尽管在搜索表中查找匹配项时存在重叠,但我只会得到一个空集。
DataFrame one (icloset) looks like: DataFrame one(icloset)看起来像:
In [167]: icloset.head()
Out[167]:
count ASL75
date_key user_uuid
20130917 000a26bf-e7ff-3124-9b00-b227ee155e7f 11 9.03510
0017b444-83f7-3adb-9727-926de4041731 3 45.05510
0022c69b-f1f5-301e-812d-89725e17c9dd 19 31.71980
00453fcd-93bd-373e-9248-f821ce8279f2 10 17.68785
004a050d-f855-3c9c-bfe0-5c504df965bc 8 45.20115
DataFrame two (definedRIDs) looks like: 数据框二(definedRID)如下所示:
In [170]: definedRIDs.head()
Out[170]:
rid user_uuid rid_slots last48status bad_RID \
0 48830 2eda12da-d613-3e1e-95de-de3c75a5f9ef 1 Fulfilling False
1 51025 a466303a-d66d-3db8-b640-c4d57d134404 1 Fulfilling False
2 51457 c41d87d3-8abc-328d-ae00-c63d7cf81ef2 1 Fulfilled False
3 48626 97ff5c81-e5df-30ac-9b7a-bda73fbf499f 1 Fulfilled False
4 51450 0ac72f09-0fb7-35ae-b8a2-ee6d131100b0 1 Fulfilled False
date_key
0 20130924
1 20130927
2 20130927
3 20130923
4 20130927
I made sure to strip out the index of definedRIDs so that it looks like this example from the docs. 我确保去除了defineRID的索引,以使其看起来像文档中的该示例 。
For some reason when I try to replicate the example in the docs, I get empty results in the merged fields (count and ASL75): 由于某些原因,当我尝试在文档中复制示例时,合并字段(count和ASL75)中的结果为空:
In [171]: definedRIDs.join(icloset,on=['date_key','user_uuid'])
Out[171]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 7623 entries, 0 to 7622
Data columns (total 8 columns):
rid 7623 non-null values
user_uuid 7623 non-null values
rid_slots 7623 non-null values
last48status 7623 non-null values
bad_RID 7623 non-null values
date_key 7623 non-null values
count 0 non-null values
ASL75 0 non-null values
dtypes: bool(1), float64(2), object(5)
But when I exported the dataframes to csv files and searched them by hand, I had user_uuid and date_key combos in both files that matched. 但是,当我将数据帧导出到csv文件并手动搜索它们时,在两个匹配的文件中都有user_uuid和date_key组合。 Any ideas on why I'm having this mismatch on the join? 关于我为什么在联接上存在这种不匹配的任何想法?
Thank you 谢谢
Reset the index on the icloset
DataFrame. 在icloset
DataFrame上重置索引。 When you specified the on
parameter, it tries to match the columns, but there is no date_key
nor user_uuid
column in the icloset
frame (since they are in the index) so it can't find a match. 当您指定on
参数时,它会尝试匹配列,但icloset
框架中没有date_key
或user_uuid
列(因为它们在索引中),因此找不到匹配项。
definedRIDs.join(icloset.reset_index(),
on=['date_key','user_uuid'])
If you're using the on
parameter, the values passed in should be column names. 如果使用on
参数,则传入的值应为列名。
Looks like I just needed to make sure the types on the keys were both dtype=object. 看起来我只需要确保键上的类型都是dtype = object。
The correct solution was in fact to join the frame with no index (on the left) to the frame with the multindex on the right: 实际上,正确的解决方案是将没有索引(左侧)的框架连接到右侧带有多重索引的框架:
closet['date_key']=closet['date_key'].astype(str)
definedRIDS['date_key'] = definedRIDS['date_key'].astype(str)
icloset = closet.set_index(['date_key','user_uuid'])
RIDdata = definedRIDs.join(icloset,on=['date_key','user_uuid'],how='inner')
Hope this helps someone else not make this mistake later, and clarifies joins with indexing a little. 希望这可以帮助其他人以后不会犯此错误,并澄清索引的加入。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.