Pandas - Merge two dataframes with different number of rows

Question

I have the following two dataframes:

df:

              value
period
2000-01-01    100
2000-04-01    200
2000-07-01    300
2000-10-01    400
2001-01-01    500

df1:

              value
period
2000-07-01    350
2000-10-01    450
2001-01-01    550
2001-04-01    600
2001-07-01    700

This is the desired output:

df:

              value
period
2000-01-01    100
2000-04-01    200
2000-07-01    350
2000-10-01    450
2001-01-01    550
2001-04-01    600
2001-07-01    700

I have set_index(['period']) on both df1 and df2. I also tried few things including concat and where statement after creating new column but notting works as expected. My first dataframe is primary. The second is kind of update. It should replace the corresponding values in the first one and in the same time add new records if any available.

How I can do this?

Answer 1

You can use combine_first , also if dtype of some index is object convert to_datetime which works nice if always df1.index is in df.index :

print (df.index.dtype)
object

print (df1.index.dtype)
object

df.index = pd.to_datetime(df.index)
df1.index = pd.to_datetime(df1.index)

df = df1.combine_first(df)
#if necessary int columns
#df = df1.combine_first(df).astype(int)
print (df)
            value
period           
2000-01-01  100.0
2000-04-01  200.0
2000-07-01  350.0
2000-10-01  450.0
2001-01-01  550.0
2001-04-01  600.0
2001-07-01  700.0

If not, then is necessary filter by intersection first:

df = df1.loc[df1.index.intersection(df.index)].combine_first(df)

Another solution with numpy.setdiff1d and concat

df = pd.concat([df.loc[np.setdiff1d(df.index, df1.index)], df1])
print (df)
            value
period           
2000-01-01    100
2000-04-01    200
2000-07-01    350
2000-10-01    450
2001-01-01    550
2001-04-01    600
2001-07-01    700

Answer 2

Is that what you want?

In [151]: pd.concat([df1, df.loc[df.index.difference(df1.index)]]).sort_index()
Out[151]:
            value
period
2000-01-01    100
2000-04-01    200
2000-07-01    350
2000-10-01    450
2001-01-01    550
2001-04-01    600
2001-07-01    700

PS make sure that both indices are of the same dtype - it's better to convert them to datetime dtype, using pd.to_datetime() method

Answer 3

Another option with append and drop_duplicates

d1 = df1.append(df)
d1[~d1.index.duplicated()]

            value
period           
2000-07-01    350
2000-10-01    450
2001-01-01    550
2001-04-01    600
2001-07-01    700
2000-01-01    100
2000-04-01    200

Answer 4

I used the pd.concat() function to concatenate the data frames, then dropped the duplicates to get the results.

df_con = pd.concat([df, df1])
df_con.drop_duplicates(subset="period",keep="last",inplace=True)
print(df_con)

       period  value
0  2000-01-01    100
1  2000-04-01    200
0  2000-07-01    350
1  2000-10-01    450
2  2001-01-01    550
3  2001-04-01    600
4  2001-07-01    700

To set "period" back as an index just set the index,

print(df_con.set_index("period"))

            value
period           
2000-01-01    100
2000-04-01    200
2000-07-01    350
2000-10-01    450
2001-01-01    550
2001-04-01    600
2001-07-01    700

Pandas - Merge two dataframes with different number of rows

Question

4 answers

solution1
4 ACCPTED 2017-05-08 20:56:34

solution2
3 2017-05-08 20:49:34

solution3
3 2017-05-08 21:43:13

solution4
0 2017-05-08 22:22:13

Pandas - Merge two dataframes with different number of rows

Question

4 answers

solution1 4 ACCPTED 2017-05-08 20:56:34

solution2 3 2017-05-08 20:49:34

solution3 3 2017-05-08 21:43:13

solution4 0 2017-05-08 22:22:13

solution1
4 ACCPTED 2017-05-08 20:56:34

solution2
3 2017-05-08 20:49:34

solution3
3 2017-05-08 21:43:13

solution4
0 2017-05-08 22:22:13