简体   繁体   中英

Join with MultiIndex DataFrames Creates Empty Fields [pandas]

I'm trying to join two DataFrames in pandas on two fields, 'date_key' and 'user_uuid', but when I do I only get an empty set, despite there being overlap when I search the tables for matches.

DataFrame one (icloset) looks like:

In [167]: icloset.head()
Out[167]: 
                                           count     ASL75
date_key user_uuid                                            
20130917 000a26bf-e7ff-3124-9b00-b227ee155e7f     11   9.03510
         0017b444-83f7-3adb-9727-926de4041731      3  45.05510
         0022c69b-f1f5-301e-812d-89725e17c9dd     19  31.71980
         00453fcd-93bd-373e-9248-f821ce8279f2     10  17.68785
         004a050d-f855-3c9c-bfe0-5c504df965bc      8  45.20115

DataFrame two (definedRIDs) looks like:

In [170]: definedRIDs.head()
Out[170]: 
     rid                             user_uuid rid_slots last48status bad_RID  \
0  48830  2eda12da-d613-3e1e-95de-de3c75a5f9ef         1  Fulfilling    False   
1  51025  a466303a-d66d-3db8-b640-c4d57d134404         1  Fulfilling    False   
2  51457  c41d87d3-8abc-328d-ae00-c63d7cf81ef2         1   Fulfilled    False   
3  48626  97ff5c81-e5df-30ac-9b7a-bda73fbf499f         1   Fulfilled    False   
4  51450  0ac72f09-0fb7-35ae-b8a2-ee6d131100b0         1   Fulfilled    False   

   date_key  
0  20130924  
1  20130927  
2  20130927  
3  20130923  
4  20130927 

I made sure to strip out the index of definedRIDs so that it looks like this example from the docs.

For some reason when I try to replicate the example in the docs, I get empty results in the merged fields (count and ASL75):

In [171]: definedRIDs.join(icloset,on=['date_key','user_uuid'])
Out[171]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 7623 entries, 0 to 7622
Data columns (total 8 columns):
rid             7623  non-null values
user_uuid       7623  non-null values
rid_slots       7623  non-null values
last48status    7623  non-null values
bad_RID         7623  non-null values
date_key        7623  non-null values
count           0  non-null values
ASL75           0  non-null values
dtypes: bool(1), float64(2), object(5)

But when I exported the dataframes to csv files and searched them by hand, I had user_uuid and date_key combos in both files that matched. Any ideas on why I'm having this mismatch on the join?

Thank you

Reset the index on the icloset DataFrame. When you specified the on parameter, it tries to match the columns, but there is no date_key nor user_uuid column in the icloset frame (since they are in the index) so it can't find a match.

definedRIDs.join(icloset.reset_index(),
                 on=['date_key','user_uuid'])

If you're using the on parameter, the values passed in should be column names.

Looks like I just needed to make sure the types on the keys were both dtype=object.

The correct solution was in fact to join the frame with no index (on the left) to the frame with the multindex on the right:

closet['date_key']=closet['date_key'].astype(str)
definedRIDS['date_key'] = definedRIDS['date_key'].astype(str)

icloset = closet.set_index(['date_key','user_uuid'])

RIDdata = definedRIDs.join(icloset,on=['date_key','user_uuid'],how='inner')

Hope this helps someone else not make this mistake later, and clarifies joins with indexing a little.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM