簡體   English   中英

合並具有重疊索引和列的 pandas 數據幀

[英]Merging pandas DataFrames with overlapping indices and columns

我有兩個 DataFrames df1df2 它們在索引和列中都有重疊的數據。

import yfinance as yf

symbols = ['QQQ', 'GBTC']
df1 = yf.download(symbols, start="2019-01-01", end="2019-01-07")

symbols = ['GBTC', 'TLT']
df2 = yf.download(symbols, start="2019-01-03", end="2019-01-15")

這是df1的內容:

           Adj Close              Close              High               Low  \
                GBTC         QQQ   GBTC         QQQ  GBTC         QQQ  GBTC   
Date                                                                          
2018-12-31     3.965  152.132996  3.965  154.259995  4.15  154.979996  3.95   
2019-01-02     4.620  152.744461  4.620  154.880005  4.65  155.750000  4.13   
2019-01-03     4.520  147.754257  4.520  149.820007  4.62  153.259995  4.32   
2019-01-04     4.530  154.075851  4.530  156.229996  4.65  157.000000  4.41   

                         Open               Volume            
                   QQQ   GBTC         QQQ     GBTC       QQQ  
Date                                                          
2018-12-31  152.710007  4.140  154.470001  3829000  53015300  
2019-01-02  150.880005  4.155  150.990005  2948200  58576700  
2019-01-03  149.490005  4.325  152.600006  1503000  74820200  
2019-01-04  151.740005  4.585  152.339996  2020700  74709300 

這是df2

           Adj Close             Close              High               Low  \
                GBTC         TLT  GBTC         TLT  GBTC         TLT  GBTC   
Date                                                                         
2019-01-02      4.62  117.461304  4.62  122.150002  4.65  122.160004  4.13   
2019-01-03      4.52  118.797966  4.52  123.540001  4.62  123.860001  4.32   
2019-01-04      4.53  117.422844  4.53  122.110001  4.65  122.559998  4.41   
2019-01-07      4.86  117.076653  4.86  121.750000  4.94  122.650002  4.74   
2019-01-08      4.96  116.768936  4.96  121.430000  5.08  121.940002  4.84   
2019-01-09      4.71  116.586197  4.71  121.239998  5.02  121.430000  4.63   
2019-01-10      4.32  115.836174  4.32  120.459999  4.46  121.410004  4.16   
2019-01-11      4.32  116.288139  4.32  120.930000  4.49  121.269997  4.25   
2019-01-14      4.47  115.855431  4.47  120.480003  4.55  121.010002  4.14   

                         Open               Volume            
                   TLT   GBTC         TLT     GBTC       TLT  
Date                                                          
2019-01-02  121.339996  4.155  121.660004  2948200  19841500  
2019-01-03  122.230003  4.325  122.290001  1503000  21187000  
2019-01-04  121.650002  4.585  122.339996  2020700  12970200  
2019-01-07  121.620003  4.740  122.620003  2676600   8498100  
2019-01-08  121.389999  4.895  121.690002  2653200   7737100  
2019-01-09  120.800003  5.015  121.260002  2778000   9349200  
2019-01-10  120.339996  4.455  121.279999  3799800   8222900  
2019-01-11  120.680000  4.410  120.830002  1218500   5786900  
2019-01-14  120.239998  4.145  120.900002  2581600   6730500 

如何將df1df2合並到df3中,以便df3具有以下內容?

> df3
         Adj Close                          Close                          \
                GBTC         QQQ         TLT   GBTC         QQQ         TLT   
Date                                                                          
2018-12-31     3.965  152.132996         NaN  3.965  154.259995         NaN 
2019-01-02     4.620  152.744461  117.461304  4.620  154.880005  122.150002    
2019-01-03     4.520  147.754257  118.797966  4.520  149.820007  123.540001 
2019-01-04     4.530  154.075851  117.422844  4.530  156.229996  122.110001   
2019-01-07     4.860         NaN  117.076653  4.860         NaN  121.750000   
2019-01-08     4.960         NaN  116.768936  4.960         NaN  121.430000   
2019-01-09     4.710         NaN  116.586197  4.710         NaN  121.239998   
2019-01-10     4.320         NaN  115.836174  4.320         NaN  120.459999   
2019-01-11     4.320         NaN  116.288139  4.320         NaN  120.930000   
2019-01-14     4.470         NaN  115.855431  4.470         NaN  120.480003   

            High                           Low                           Open  \
            GBTC         QQQ         TLT  GBTC         QQQ         TLT   GBTC   
Date                                                                            
2018-12-31  4.15  154.979996         NaN  3.95  152.710007         NaN  4.140    
2019-01-02  4.65  155.750000  122.160004  4.13  150.880005  121.339996  4.155     
2019-01-03  4.62  153.259995  123.860001  4.32  149.490005  122.230003  4.325    
2019-01-04  4.65  157.000000  122.559998  4.41  151.740005  121.650002  4.585   
2019-01-07  4.94         NaN  122.650002  4.74         NaN  121.620003  4.740   
2019-01-08  5.08         NaN  121.940002  4.84         NaN  121.389999  4.895   
2019-01-09  5.02         NaN  121.430000  4.63         NaN  120.800003  5.015   
2019-01-10  4.46         NaN  121.410004  4.16         NaN  120.339996  4.455   
2019-01-11  4.49         NaN  121.269997  4.25         NaN  120.680000  4.410   
2019-01-14  4.55         NaN  121.010002  4.14         NaN  120.239998  4.145   

                                     Volume                          
                   QQQ         TLT     GBTC         QQQ         TLT  
Date                                                                 
2018-12-31  154.470001         NaN  3829000  53015300.0         NaN  
2019-01-02  150.990005  121.660004  2948200  58576700.0  19841500.0  
2019-01-03  152.600006  122.290001  1503000  74820200.0  21187000.0  
2019-01-04  152.339996  122.339996  2020700  74709300.0  12970200.0  
2019-01-07         NaN  122.620003  2676600         NaN   8498100.0  
2019-01-08         NaN  121.690002  2653200         NaN   7737100.0  
2019-01-09         NaN  121.260002  2778000         NaN   9349200.0  
2019-01-10         NaN  121.279999  3799800         NaN   8222900.0  
2019-01-11         NaN  120.830002  1218500         NaN   5786900.0  
2019-01-14         NaN  120.900002  2581600         NaN   6730500.0 

df4 = df1.append(df2).drop_duplicates().sort_index()返回一個類似於df3的 dataframe 。

但是df3df4仍然不同。

> df4
           Adj Close                          Close                          \
                GBTC         QQQ         TLT   GBTC         QQQ         TLT   
Date                                                                          
2018-12-31     3.965  152.132996         NaN  3.965  154.259995         NaN   
2019-01-02     4.620  152.744461         NaN  4.620  154.880005         NaN   
2019-01-02     4.620         NaN  117.461304  4.620         NaN  122.150002   
2019-01-03     4.520  147.754257         NaN  4.520  149.820007         NaN   
2019-01-03     4.520         NaN  118.797966  4.520         NaN  123.540001   
2019-01-04     4.530  154.075851         NaN  4.530  156.229996         NaN   
2019-01-04     4.530         NaN  117.422844  4.530         NaN  122.110001   
2019-01-07     4.860         NaN  117.076653  4.860         NaN  121.750000   
2019-01-08     4.960         NaN  116.768936  4.960         NaN  121.430000   
2019-01-09     4.710         NaN  116.586197  4.710         NaN  121.239998   
2019-01-10     4.320         NaN  115.836174  4.320         NaN  120.459999   
2019-01-11     4.320         NaN  116.288139  4.320         NaN  120.930000   
2019-01-14     4.470         NaN  115.855431  4.470         NaN  120.480003   

            High                           Low                           Open  \
            GBTC         QQQ         TLT  GBTC         QQQ         TLT   GBTC   
Date                                                                            
2018-12-31  4.15  154.979996         NaN  3.95  152.710007         NaN  4.140   
2019-01-02  4.65  155.750000         NaN  4.13  150.880005         NaN  4.155   
2019-01-02  4.65         NaN  122.160004  4.13         NaN  121.339996  4.155   
2019-01-03  4.62  153.259995         NaN  4.32  149.490005         NaN  4.325   
2019-01-03  4.62         NaN  123.860001  4.32         NaN  122.230003  4.325   
2019-01-04  4.65  157.000000         NaN  4.41  151.740005         NaN  4.585   
2019-01-04  4.65         NaN  122.559998  4.41         NaN  121.650002  4.585   
2019-01-07  4.94         NaN  122.650002  4.74         NaN  121.620003  4.740   
2019-01-08  5.08         NaN  121.940002  4.84         NaN  121.389999  4.895   
2019-01-09  5.02         NaN  121.430000  4.63         NaN  120.800003  5.015   
2019-01-10  4.46         NaN  121.410004  4.16         NaN  120.339996  4.455   
2019-01-11  4.49         NaN  121.269997  4.25         NaN  120.680000  4.410   
2019-01-14  4.55         NaN  121.010002  4.14         NaN  120.239998  4.145   

                                     Volume                          
                   QQQ         TLT     GBTC         QQQ         TLT  
Date                                                                 
2018-12-31  154.470001         NaN  3829000  53015300.0         NaN  
2019-01-02  150.990005         NaN  2948200  58576700.0         NaN  
2019-01-02         NaN  121.660004  2948200         NaN  19841500.0  
2019-01-03  152.600006         NaN  1503000  74820200.0         NaN  
2019-01-03         NaN  122.290001  1503000         NaN  21187000.0  
2019-01-04  152.339996         NaN  2020700  74709300.0         NaN  
2019-01-04         NaN  122.339996  2020700         NaN  12970200.0  
2019-01-07         NaN  122.620003  2676600         NaN   8498100.0  
2019-01-08         NaN  121.690002  2653200         NaN   7737100.0  
2019-01-09         NaN  121.260002  2778000         NaN   9349200.0  
2019-01-10         NaN  121.279999  3799800         NaN   8222900.0  
2019-01-11         NaN  120.830002  1218500         NaN   5786900.0  
2019-01-14         NaN  120.900002  2581600         NaN   6730500.0

不確定這是否是您想要的,但這似乎是df3

pd.concat([df1.stack(), df2.stack()]).sort_values(by='Date').drop_duplicates().unstack()

這是 pandas 中重塑的文檔,它可能對堆棧和取消堆棧操作有更多的了解。 標題中的多索引可以“堆疊”到行,然后將兩個數據幀與 concat 一起壓縮。 由於您在重疊的日期范圍內的兩個數據框中抓取相同的符號,因此需要放置重復項。 然后將 go 恢復為原始格式,只需將其解壓即可。

https://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM