简体   繁体   中英

Python / Pandas - Merging on index with multiple repeated keys

I have this dataframe:

df1:
                year     revenues  
index                                                                    
03374312000153  2010        25432 
03374312000153  2009        25433 
48300560000198  2014        13894  
48300560000198  2013        18533 
48300560000198  2012        18534
NaN             NaN         NaN 
...

And I have this other dataframe:

df2:
                Name         Street  
index                                                                    
03374312000153  Yeap Co     Locc St 
54623827374939  Damn Co     Geez St 
37273829349299  Woohoo Co  Under St 
...

I need to select only the rows from df1 on which its index appear on df2.index and merge them, so it would look like this:

                year     revenues    Name      Street
index                                                                    
03374312000153  2010        25432 Yeap Co     Locc St
03374312000153  2009        25433 Yeap Co     Locc St
...

If I try:

df2=df2.merge(df1,left_index=True,right_index=True)

I get an error:

TypeError: type object argument after * must be a sequence, not map

If I try:

df2=df2.join(df1)

I get the same error as above.

Can someone help?

I actually see nothing wrong with what you're doing, using Pandas 0.19.2. If your version isn't up to date that could be your issue. Check it with:

import pandas as pd
pd.__version__

How I built your dataframes:

df1 = pd.DataFrame({'year' : pd.Series([2010,2009,2014,2013,2012], index=['03374312000153','03374312000153','48300560000198','48300560000198','48300560000198']),
   'revenues' : pd.Series([25432,25433,13894,18533,18534], index=['03374312000153','03374312000153','48300560000198','48300560000198','48300560000198'])})

df2 = pd.DataFrame({'Name' : pd.Series(['Yeap Co','Damn Co','Woohoo Co'],index=['03374312000153','54623827374939','37273829349299'] ),
                   'Street' : pd.Series(['Locc St','Geez St','Under St'], index=['03374312000153','54623827374939','37273829349299'] )})

df2.merge(df1,left_index=True,right_index=True)


Name    Street  revenues    year
03374312000153  Yeap Co Locc St 25432   2010
03374312000153  Yeap Co Locc St 25433   2009

Some thoughts:

  • It's not preferred practice to have a non-unique index, in part because if you end up writing to an RDBMS that has a constraint on unique primary key, you'll error out. In this case you'd join on a column as a key instead of the index.
  • It's good practice to specify (as @Wen did) the 'how' option to your method.
  • It's good practice to generate a new dataframe from a join instead of writing over an old one. That way if the join fails, especially on a large dataframe, you don't have to re-create the previous dataframes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM