简体   繁体   中英

Joining/Merging two Pandas dataframes. Match the levels of one to the index of the other

I am trying to join two pandas dataframes; The left one, has a multiindex and the right one is just a plain vanilla dataframe. I would like to join the index of the right dataframe on one of the levels of the left dataframe. For example if we have the following example:

           Age
Boys          
      Sam   21
      John  22
Girls         
      Lisa  23

and

      Points
John       1
Lisa       2
Sam        3

I would like to end up with this:

           Age Points
Boys                 
      Sam   21      3
      John  22      1
Girls                
      Lisa  23      2

The way I have worked it out is as follows, I am just wondering if there is a more straightforward way

In[2]: import pandas as pd
In[3]: idx = pd.MultiIndex(levels=[['Boys', 'Girls', ''],['Sam', 'John', 'Lisa', '']], labels=[[0,2,2,1,2],[3,0,1,3,2]])
df1 = pd.DataFrame({'Age':['',21,22,'',23]}, index=idx)
df2 = pd.DataFrame({'Points':[1, 2, 3]}, index=['John','Lisa','Sam'])

In[4]: df1
Out[4]: 
           Age
Boys          
      Sam   21
      John  22
Girls         
      Lisa  23

In[5]: df2
Out[5]: 
      Points
John       1
Lisa       2
Sam        3

I have then written this loop which "transforms" the right dataframe by giving it a multi-index and the values appropriately rearranged

lvl = df1.index.levels[1]
lbl = df1.index.labels[1]
y = df2.iloc[:,0].values.tolist()
z=[]
for x in [lvl[k] for k in lbl]:
    try:
        idx = df2.index.tolist().index(x)
    except ValueError as e:
        z.append('')
    else:
        z.append(y[idx])

temp=pd.DataFrame(index=df1.index)
temp['Points'] = z

I can now join them

out = df1.join(temp)
out
Out[6]: 
           Age Points
Boys                 
      Sam   21      3
      John  22      1
Girls                
      Lisa  23      2

Name your indexes - it will help Pandas to understand how to join your data frames:

In [72]: df1
Out[72]:
           Age
sex   name
Boys
      Sam   21
      John  22
Girls
      Lisa  23

In [73]: df1.index.names=['sex','name']

In [74]: df2.index.name = 'name'

Joining can be pretty easy now:

In [75]: df1.join(df2)
Out[75]:
           Age  Points
sex   name
Boys               NaN
      Sam   21       3
      John  22       1
Girls              NaN
      Lisa  23       2

PS NaNs - are result of your empty rows

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM