Pandas - 合並兩個具有不同行數的數據幀

Question

我有以下兩個數據幀：

DF：

              value
period
2000-01-01    100
2000-04-01    200
2000-07-01    300
2000-10-01    400
2001-01-01    500

DF1：

              value
period
2000-07-01    350
2000-10-01    450
2001-01-01    550
2001-04-01    600
2001-07-01    700

這是所需的輸出：

DF：

              value
period
2000-01-01    100
2000-04-01    200
2000-07-01    350
2000-10-01    450
2001-01-01    550
2001-04-01    600
2001-07-01    700

我在df1和df2都有set_index(['period']) 。 我還嘗試了一些東西，包括concat和在創建新列之后的where語句但是notting按預期工作。 我的第一個數據框是主要的。 第二種是更新。 它應該替換第一個中的相應值，同時添加新記錄（如果有的話）。

我怎么能這樣做？

Answer 1

您可以使用combine_first ，也如果dtype的一些指標是object轉換to_datetime如果老是這工作良好df1.index是df.index ：

print (df.index.dtype)
object

print (df1.index.dtype)
object

df.index = pd.to_datetime(df.index)
df1.index = pd.to_datetime(df1.index)

df = df1.combine_first(df)
#if necessary int columns
#df = df1.combine_first(df).astype(int)
print (df)
            value
period           
2000-01-01  100.0
2000-04-01  200.0
2000-07-01  350.0
2000-10-01  450.0
2001-01-01  550.0
2001-04-01  600.0
2001-07-01  700.0

如果沒有，則必須首先按intersection過濾：

df = df1.loc[df1.index.intersection(df.index)].combine_first(df)

使用numpy.setdiff1d和concat另一個解決方案

df = pd.concat([df.loc[np.setdiff1d(df.index, df1.index)], df1])
print (df)
            value
period           
2000-01-01    100
2000-04-01    200
2000-07-01    350
2000-10-01    450
2001-01-01    550
2001-04-01    600
2001-07-01    700

Answer 2

那是你要的嗎？

In [151]: pd.concat([df1, df.loc[df.index.difference(df1.index)]]).sort_index()
Out[151]:
            value
period
2000-01-01    100
2000-04-01    200
2000-07-01    350
2000-10-01    450
2001-01-01    550
2001-04-01    600
2001-07-01    700

PS確保兩個索引具有相同的dtype - 最好使用pd.to_datetime()方法將它們轉換為datetime pd.to_datetime()

Answer 3

另一個選項有append和drop_duplicates

d1 = df1.append(df)
d1[~d1.index.duplicated()]

            value
period           
2000-07-01    350
2000-10-01    450
2001-01-01    550
2001-04-01    600
2001-07-01    700
2000-01-01    100
2000-04-01    200

Answer 4

我使用pd.concat（）函數來連接數據框，然后刪除重復項以獲得結果。

df_con = pd.concat([df, df1])
df_con.drop_duplicates(subset="period",keep="last",inplace=True)
print(df_con)

       period  value
0  2000-01-01    100
1  2000-04-01    200
0  2000-07-01    350
1  2000-10-01    450
2  2001-01-01    550
3  2001-04-01    600
4  2001-07-01    700

要將“句點”設置為索引，只需設置索引，

print(df_con.set_index("period"))

            value
period           
2000-01-01    100
2000-04-01    200
2000-07-01    350
2000-10-01    450
2001-01-01    550
2001-04-01    600
2001-07-01    700

Pandas - 合並兩個具有不同行數的數據幀

問題描述

4 個解決方案

解決方案1
4 已采納 2017-05-08 20:56:34

解決方案2
3 2017-05-08 20:49:34

解決方案3
3 2017-05-08 21:43:13

解決方案4
0 2017-05-08 22:22:13

Pandas - 合並兩個具有不同行數的數據幀

問題描述

4 個解決方案

解決方案1 4 已采納 2017-05-08 20:56:34

解決方案2 3 2017-05-08 20:49:34

解決方案3 3 2017-05-08 21:43:13

解決方案4 0 2017-05-08 22:22:13

解決方案1
4 已采納 2017-05-08 20:56:34

解決方案2
3 2017-05-08 20:49:34

解決方案3
3 2017-05-08 21:43:13

解決方案4
0 2017-05-08 22:22:13