![](/img/trans.png)
[英]Merging and updating multiple pandas dataframes with overlapping columns
[英]Merging pandas DataFrames with overlapping indices and columns
我有兩個 DataFrames df1
和df2
。 它們在索引和列中都有重疊的數據。
import yfinance as yf
symbols = ['QQQ', 'GBTC']
df1 = yf.download(symbols, start="2019-01-01", end="2019-01-07")
symbols = ['GBTC', 'TLT']
df2 = yf.download(symbols, start="2019-01-03", end="2019-01-15")
這是df1
的內容:
Adj Close Close High Low \
GBTC QQQ GBTC QQQ GBTC QQQ GBTC
Date
2018-12-31 3.965 152.132996 3.965 154.259995 4.15 154.979996 3.95
2019-01-02 4.620 152.744461 4.620 154.880005 4.65 155.750000 4.13
2019-01-03 4.520 147.754257 4.520 149.820007 4.62 153.259995 4.32
2019-01-04 4.530 154.075851 4.530 156.229996 4.65 157.000000 4.41
Open Volume
QQQ GBTC QQQ GBTC QQQ
Date
2018-12-31 152.710007 4.140 154.470001 3829000 53015300
2019-01-02 150.880005 4.155 150.990005 2948200 58576700
2019-01-03 149.490005 4.325 152.600006 1503000 74820200
2019-01-04 151.740005 4.585 152.339996 2020700 74709300
這是df2
:
Adj Close Close High Low \
GBTC TLT GBTC TLT GBTC TLT GBTC
Date
2019-01-02 4.62 117.461304 4.62 122.150002 4.65 122.160004 4.13
2019-01-03 4.52 118.797966 4.52 123.540001 4.62 123.860001 4.32
2019-01-04 4.53 117.422844 4.53 122.110001 4.65 122.559998 4.41
2019-01-07 4.86 117.076653 4.86 121.750000 4.94 122.650002 4.74
2019-01-08 4.96 116.768936 4.96 121.430000 5.08 121.940002 4.84
2019-01-09 4.71 116.586197 4.71 121.239998 5.02 121.430000 4.63
2019-01-10 4.32 115.836174 4.32 120.459999 4.46 121.410004 4.16
2019-01-11 4.32 116.288139 4.32 120.930000 4.49 121.269997 4.25
2019-01-14 4.47 115.855431 4.47 120.480003 4.55 121.010002 4.14
Open Volume
TLT GBTC TLT GBTC TLT
Date
2019-01-02 121.339996 4.155 121.660004 2948200 19841500
2019-01-03 122.230003 4.325 122.290001 1503000 21187000
2019-01-04 121.650002 4.585 122.339996 2020700 12970200
2019-01-07 121.620003 4.740 122.620003 2676600 8498100
2019-01-08 121.389999 4.895 121.690002 2653200 7737100
2019-01-09 120.800003 5.015 121.260002 2778000 9349200
2019-01-10 120.339996 4.455 121.279999 3799800 8222900
2019-01-11 120.680000 4.410 120.830002 1218500 5786900
2019-01-14 120.239998 4.145 120.900002 2581600 6730500
如何將df1
和df2
合並到df3
中,以便df3
具有以下內容?
> df3
Adj Close Close \
GBTC QQQ TLT GBTC QQQ TLT
Date
2018-12-31 3.965 152.132996 NaN 3.965 154.259995 NaN
2019-01-02 4.620 152.744461 117.461304 4.620 154.880005 122.150002
2019-01-03 4.520 147.754257 118.797966 4.520 149.820007 123.540001
2019-01-04 4.530 154.075851 117.422844 4.530 156.229996 122.110001
2019-01-07 4.860 NaN 117.076653 4.860 NaN 121.750000
2019-01-08 4.960 NaN 116.768936 4.960 NaN 121.430000
2019-01-09 4.710 NaN 116.586197 4.710 NaN 121.239998
2019-01-10 4.320 NaN 115.836174 4.320 NaN 120.459999
2019-01-11 4.320 NaN 116.288139 4.320 NaN 120.930000
2019-01-14 4.470 NaN 115.855431 4.470 NaN 120.480003
High Low Open \
GBTC QQQ TLT GBTC QQQ TLT GBTC
Date
2018-12-31 4.15 154.979996 NaN 3.95 152.710007 NaN 4.140
2019-01-02 4.65 155.750000 122.160004 4.13 150.880005 121.339996 4.155
2019-01-03 4.62 153.259995 123.860001 4.32 149.490005 122.230003 4.325
2019-01-04 4.65 157.000000 122.559998 4.41 151.740005 121.650002 4.585
2019-01-07 4.94 NaN 122.650002 4.74 NaN 121.620003 4.740
2019-01-08 5.08 NaN 121.940002 4.84 NaN 121.389999 4.895
2019-01-09 5.02 NaN 121.430000 4.63 NaN 120.800003 5.015
2019-01-10 4.46 NaN 121.410004 4.16 NaN 120.339996 4.455
2019-01-11 4.49 NaN 121.269997 4.25 NaN 120.680000 4.410
2019-01-14 4.55 NaN 121.010002 4.14 NaN 120.239998 4.145
Volume
QQQ TLT GBTC QQQ TLT
Date
2018-12-31 154.470001 NaN 3829000 53015300.0 NaN
2019-01-02 150.990005 121.660004 2948200 58576700.0 19841500.0
2019-01-03 152.600006 122.290001 1503000 74820200.0 21187000.0
2019-01-04 152.339996 122.339996 2020700 74709300.0 12970200.0
2019-01-07 NaN 122.620003 2676600 NaN 8498100.0
2019-01-08 NaN 121.690002 2653200 NaN 7737100.0
2019-01-09 NaN 121.260002 2778000 NaN 9349200.0
2019-01-10 NaN 121.279999 3799800 NaN 8222900.0
2019-01-11 NaN 120.830002 1218500 NaN 5786900.0
2019-01-14 NaN 120.900002 2581600 NaN 6730500.0
df4 = df1.append(df2).drop_duplicates().sort_index()
返回一個類似於df3
的 dataframe 。
但是df3
和df4
仍然不同。
> df4
Adj Close Close \
GBTC QQQ TLT GBTC QQQ TLT
Date
2018-12-31 3.965 152.132996 NaN 3.965 154.259995 NaN
2019-01-02 4.620 152.744461 NaN 4.620 154.880005 NaN
2019-01-02 4.620 NaN 117.461304 4.620 NaN 122.150002
2019-01-03 4.520 147.754257 NaN 4.520 149.820007 NaN
2019-01-03 4.520 NaN 118.797966 4.520 NaN 123.540001
2019-01-04 4.530 154.075851 NaN 4.530 156.229996 NaN
2019-01-04 4.530 NaN 117.422844 4.530 NaN 122.110001
2019-01-07 4.860 NaN 117.076653 4.860 NaN 121.750000
2019-01-08 4.960 NaN 116.768936 4.960 NaN 121.430000
2019-01-09 4.710 NaN 116.586197 4.710 NaN 121.239998
2019-01-10 4.320 NaN 115.836174 4.320 NaN 120.459999
2019-01-11 4.320 NaN 116.288139 4.320 NaN 120.930000
2019-01-14 4.470 NaN 115.855431 4.470 NaN 120.480003
High Low Open \
GBTC QQQ TLT GBTC QQQ TLT GBTC
Date
2018-12-31 4.15 154.979996 NaN 3.95 152.710007 NaN 4.140
2019-01-02 4.65 155.750000 NaN 4.13 150.880005 NaN 4.155
2019-01-02 4.65 NaN 122.160004 4.13 NaN 121.339996 4.155
2019-01-03 4.62 153.259995 NaN 4.32 149.490005 NaN 4.325
2019-01-03 4.62 NaN 123.860001 4.32 NaN 122.230003 4.325
2019-01-04 4.65 157.000000 NaN 4.41 151.740005 NaN 4.585
2019-01-04 4.65 NaN 122.559998 4.41 NaN 121.650002 4.585
2019-01-07 4.94 NaN 122.650002 4.74 NaN 121.620003 4.740
2019-01-08 5.08 NaN 121.940002 4.84 NaN 121.389999 4.895
2019-01-09 5.02 NaN 121.430000 4.63 NaN 120.800003 5.015
2019-01-10 4.46 NaN 121.410004 4.16 NaN 120.339996 4.455
2019-01-11 4.49 NaN 121.269997 4.25 NaN 120.680000 4.410
2019-01-14 4.55 NaN 121.010002 4.14 NaN 120.239998 4.145
Volume
QQQ TLT GBTC QQQ TLT
Date
2018-12-31 154.470001 NaN 3829000 53015300.0 NaN
2019-01-02 150.990005 NaN 2948200 58576700.0 NaN
2019-01-02 NaN 121.660004 2948200 NaN 19841500.0
2019-01-03 152.600006 NaN 1503000 74820200.0 NaN
2019-01-03 NaN 122.290001 1503000 NaN 21187000.0
2019-01-04 152.339996 NaN 2020700 74709300.0 NaN
2019-01-04 NaN 122.339996 2020700 NaN 12970200.0
2019-01-07 NaN 122.620003 2676600 NaN 8498100.0
2019-01-08 NaN 121.690002 2653200 NaN 7737100.0
2019-01-09 NaN 121.260002 2778000 NaN 9349200.0
2019-01-10 NaN 121.279999 3799800 NaN 8222900.0
2019-01-11 NaN 120.830002 1218500 NaN 5786900.0
2019-01-14 NaN 120.900002 2581600 NaN 6730500.0
不確定這是否是您想要的,但這似乎是df3
。
pd.concat([df1.stack(), df2.stack()]).sort_values(by='Date').drop_duplicates().unstack()
這是 pandas 中重塑的文檔,它可能對堆棧和取消堆棧操作有更多的了解。 標題中的多索引可以“堆疊”到行,然后將兩個數據幀與 concat 一起壓縮。 由於您在重疊的日期范圍內的兩個數據框中抓取相同的符號,因此需要放置重復項。 然后將 go 恢復為原始格式,只需將其解壓即可。
https://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.