与MultiIndex DataFrames一起创建空字段[pandas]

Question

I'm trying to join two DataFrames in pandas on two fields, 'date_key' and 'user_uuid', but when I do I only get an empty set, despite there being overlap when I search the tables for matches. 我正在尝试在两个字段（“ date_key”和“ user_uuid”）上以大熊猫的形式联接两个DataFrame，但是当我这样做时，尽管在搜索表中查找匹配项时存在重叠，但我只会得到一个空集。

DataFrame one (icloset) looks like: DataFrame one（icloset）看起来像：

In [167]: icloset.head()
Out[167]: 
                                           count     ASL75
date_key user_uuid                                            
20130917 000a26bf-e7ff-3124-9b00-b227ee155e7f     11   9.03510
         0017b444-83f7-3adb-9727-926de4041731      3  45.05510
         0022c69b-f1f5-301e-812d-89725e17c9dd     19  31.71980
         00453fcd-93bd-373e-9248-f821ce8279f2     10  17.68785
         004a050d-f855-3c9c-bfe0-5c504df965bc      8  45.20115

DataFrame two (definedRIDs) looks like: 数据框二（definedRID）如下所示：

In [170]: definedRIDs.head()
Out[170]: 
     rid                             user_uuid rid_slots last48status bad_RID  \
0  48830  2eda12da-d613-3e1e-95de-de3c75a5f9ef         1  Fulfilling    False   
1  51025  a466303a-d66d-3db8-b640-c4d57d134404         1  Fulfilling    False   
2  51457  c41d87d3-8abc-328d-ae00-c63d7cf81ef2         1   Fulfilled    False   
3  48626  97ff5c81-e5df-30ac-9b7a-bda73fbf499f         1   Fulfilled    False   
4  51450  0ac72f09-0fb7-35ae-b8a2-ee6d131100b0         1   Fulfilled    False   

   date_key  
0  20130924  
1  20130927  
2  20130927  
3  20130923  
4  20130927

I made sure to strip out the index of definedRIDs so that it looks like this example from the docs. 我确保去除了defineRID的索引，以使其看起来像文档中的该示例。

For some reason when I try to replicate the example in the docs, I get empty results in the merged fields (count and ASL75): 由于某些原因，当我尝试在文档中复制示例时，合并字段（count和ASL75）中的结果为空：

In [171]: definedRIDs.join(icloset,on=['date_key','user_uuid'])
Out[171]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 7623 entries, 0 to 7622
Data columns (total 8 columns):
rid             7623  non-null values
user_uuid       7623  non-null values
rid_slots       7623  non-null values
last48status    7623  non-null values
bad_RID         7623  non-null values
date_key        7623  non-null values
count           0  non-null values
ASL75           0  non-null values
dtypes: bool(1), float64(2), object(5)

But when I exported the dataframes to csv files and searched them by hand, I had user_uuid and date_key combos in both files that matched. 但是，当我将数据帧导出到csv文件并手动搜索它们时，在两个匹配的文件中都有user_uuid和date_key组合。 Any ideas on why I'm having this mismatch on the join? 关于我为什么在联接上存在这种不匹配的任何想法？

Thank you 谢谢

Answer 1

Reset the index on the icloset DataFrame. 在icloset DataFrame上重置索引。 When you specified the on parameter, it tries to match the columns, but there is no date_key nor user_uuid column in the icloset frame (since they are in the index) so it can't find a match. 当您指定on参数时，它会尝试匹配列，但icloset框架中没有date_key或user_uuid列（因为它们在索引中），因此找不到匹配项。

definedRIDs.join(icloset.reset_index(),
                 on=['date_key','user_uuid'])

If you're using the on parameter, the values passed in should be column names. 如果使用on参数，则传入的值应为列名。

Answer 2

Looks like I just needed to make sure the types on the keys were both dtype=object. 看起来我只需要确保键上的类型都是dtype = object。

The correct solution was in fact to join the frame with no index (on the left) to the frame with the multindex on the right: 实际上，正确的解决方案是将没有索引（左侧）的框架连接到右侧带有多重索引的框架：

closet['date_key']=closet['date_key'].astype(str)
definedRIDS['date_key'] = definedRIDS['date_key'].astype(str)

icloset = closet.set_index(['date_key','user_uuid'])

RIDdata = definedRIDs.join(icloset,on=['date_key','user_uuid'],how='inner')

Hope this helps someone else not make this mistake later, and clarifies joins with indexing a little. 希望这可以帮助其他人以后不会犯此错误，并澄清索引的加入。

与MultiIndex DataFrames一起创建空字段[pandas]

问题描述

2 个解决方案

解决方案1
3 2013-10-01 23:30:31

解决方案2
0 已采纳 2013-10-02 15:11:28

与MultiIndex DataFrames一起创建空字段[pandas]

问题描述

2 个解决方案

解决方案1 3 2013-10-01 23:30:31

解决方案2 0 已采纳 2013-10-02 15:11:28

解决方案1
3 2013-10-01 23:30:31

解决方案2
0 已采纳 2013-10-02 15:11:28