简体   繁体   English

与MultiIndex DataFrames一起创建空字段[pandas]

[英]Join with MultiIndex DataFrames Creates Empty Fields [pandas]

I'm trying to join two DataFrames in pandas on two fields, 'date_key' and 'user_uuid', but when I do I only get an empty set, despite there being overlap when I search the tables for matches. 我正在尝试在两个字段(“ date_key”和“ user_uuid”)上以大熊猫的形式联接两个DataFrame,但是当我这样做时,尽管在搜索表中查找匹配项时存在重叠,但我只会得到一个空集。

DataFrame one (icloset) looks like: DataFrame one(icloset)看起来像:

In [167]: icloset.head()
Out[167]: 
                                           count     ASL75
date_key user_uuid                                            
20130917 000a26bf-e7ff-3124-9b00-b227ee155e7f     11   9.03510
         0017b444-83f7-3adb-9727-926de4041731      3  45.05510
         0022c69b-f1f5-301e-812d-89725e17c9dd     19  31.71980
         00453fcd-93bd-373e-9248-f821ce8279f2     10  17.68785
         004a050d-f855-3c9c-bfe0-5c504df965bc      8  45.20115

DataFrame two (definedRIDs) looks like: 数据框二(definedRID)如下所示:

In [170]: definedRIDs.head()
Out[170]: 
     rid                             user_uuid rid_slots last48status bad_RID  \
0  48830  2eda12da-d613-3e1e-95de-de3c75a5f9ef         1  Fulfilling    False   
1  51025  a466303a-d66d-3db8-b640-c4d57d134404         1  Fulfilling    False   
2  51457  c41d87d3-8abc-328d-ae00-c63d7cf81ef2         1   Fulfilled    False   
3  48626  97ff5c81-e5df-30ac-9b7a-bda73fbf499f         1   Fulfilled    False   
4  51450  0ac72f09-0fb7-35ae-b8a2-ee6d131100b0         1   Fulfilled    False   

   date_key  
0  20130924  
1  20130927  
2  20130927  
3  20130923  
4  20130927 

I made sure to strip out the index of definedRIDs so that it looks like this example from the docs. 我确保去除了defineRID的索引,以使其看起来像文档中的该示例

For some reason when I try to replicate the example in the docs, I get empty results in the merged fields (count and ASL75): 由于某些原因,当我尝试在文档中复制示例时,合并字段(count和ASL75)中的结果为空:

In [171]: definedRIDs.join(icloset,on=['date_key','user_uuid'])
Out[171]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 7623 entries, 0 to 7622
Data columns (total 8 columns):
rid             7623  non-null values
user_uuid       7623  non-null values
rid_slots       7623  non-null values
last48status    7623  non-null values
bad_RID         7623  non-null values
date_key        7623  non-null values
count           0  non-null values
ASL75           0  non-null values
dtypes: bool(1), float64(2), object(5)

But when I exported the dataframes to csv files and searched them by hand, I had user_uuid and date_key combos in both files that matched. 但是,当我将数据帧导出到csv文件并手动搜索它们时,在两个匹配的文件中都有user_uuid和date_key组合。 Any ideas on why I'm having this mismatch on the join? 关于我为什么在联接上存在这种不匹配的任何想法?

Thank you 谢谢

Reset the index on the icloset DataFrame. icloset DataFrame上重置索引。 When you specified the on parameter, it tries to match the columns, but there is no date_key nor user_uuid column in the icloset frame (since they are in the index) so it can't find a match. 当您指定on参数时,它会尝试匹配列,但icloset框架中没有date_keyuser_uuid列(因为它们在索引中),因此找不到匹配项。

definedRIDs.join(icloset.reset_index(),
                 on=['date_key','user_uuid'])

If you're using the on parameter, the values passed in should be column names. 如果使用on参数,则传入的值应为列名。

Looks like I just needed to make sure the types on the keys were both dtype=object. 看起来我只需要确保键上的类型都是dtype = object。

The correct solution was in fact to join the frame with no index (on the left) to the frame with the multindex on the right: 实际上,正确的解决方案是将没有索引(左侧)的框架连接到右侧带有多重索引的框架:

closet['date_key']=closet['date_key'].astype(str)
definedRIDS['date_key'] = definedRIDS['date_key'].astype(str)

icloset = closet.set_index(['date_key','user_uuid'])

RIDdata = definedRIDs.join(icloset,on=['date_key','user_uuid'],how='inner')

Hope this helps someone else not make this mistake later, and clarifies joins with indexing a little. 希望这可以帮助其他人以后不会犯此错误,并澄清索引的加入。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM