簡體   English   中英

合並多個熊貓數據框

[英]Merging multiple pandas Dataframes

我正在努力解決以下問題。 我有多個單獨的數據框(50),每個數據框都包含許多股票的一種特定特征(例如價格,標准差等),因此如下所示:

import pandas as pd
import numpy as np

dates = pd.date_range('20130101',periods=6)

df1 = pd.DataFrame(np.random.randn(6,4),index=dates,\
columns('AAPL','MSFT','TSLA','GE'))

df2 = pd.DataFrame(np.random.randn(6,4),index=dates,\
columns=('AAPL','MSFT','TSLA','GE'))

df3 = pd.DataFrame(np.random.randn(6,4),index=dates,\
columns=('AAPL','MSFT','TSLA','GE'))

df4 = pd.DataFrame(np.random.randn(6,4),index=dates,\
columns=('AAPL','MSFT','TSLA','GE'))

現在,我想以一種方式合並它們,以便為每個股票獲得一個數據框,其中包含該特定股票的所有特征,因此如下所示:

aapl = pd.DataFrame(np.random.randn(6,4),index=dates,\
columns=('AAPL1','AAPL2','AAPL3','AAPL4'))

msft = pd.DataFrame(np.random.randn(6,4),index=dates,\
columns=('MSFT1','MSFT2','MSFT3','MSFT4'))

tsla = pd.DataFrame(np.random.randn(6,4),index=dates,\
columns=('TSLA1','TSLA2','TSLA3','TSLA4'))

ge = pd.DataFrame(np.random.randn(6,4),index=dates,\
columns=('GE1','GE2','GE3','GE4'))

我會用concat:

In [11]: res = pd.concat([df1, df2, df3, df4], keys=[1, 2, 3, 4], axis=1)

In [12]: res
Out[12]:
                   1                                       2                                       3                                       4
                AAPL      MSFT      TSLA        GE      AAPL      MSFT      TSLA        GE      AAPL      MSFT      TSLA        GE      AAPL      MSFT      TSLA        GE
2013-01-01  0.144764  1.292692 -1.303908 -0.843892 -1.104683 -1.178507  0.898648 -0.626209  0.492292  0.147169  1.814729  0.562406 -0.121656  0.865116  0.430813 -0.326225
2013-01-02 -0.163063  0.019601 -2.565271  0.708233  0.317464 -2.574969 -0.080129 -1.176806  0.045253  0.684745 -1.062797 -0.483389 -0.579194  0.401920 -0.393240  0.113734
2013-01-03  0.213592 -0.732072 -0.942323  0.191418 -0.962551 -0.027296  0.665155  2.775983 -0.627107 -0.015927  0.939107  0.239057  0.548166 -1.753082 -0.007525  1.771812
2013-01-04  1.067464 -0.331888  0.638843 -1.197937  0.925848  2.273798  0.646925 -2.910974  0.531653 -0.748255  0.262995  0.077923 -0.867982  1.174089  0.183573  0.263749
2013-01-05  0.873720 -0.816305  0.270330 -1.543169  0.116701 -1.392711  1.519368 -0.601046 -0.154348 -0.345653 -0.785385 -0.095604  1.351421  0.192520  0.802445  2.107376
2013-01-06 -0.781975  1.007111 -2.555165 -1.866207  1.480997  0.212057  1.053570 -0.798790 -0.785660 -0.853178 -2.274432  0.481971 -1.555876 -0.928069 -0.408319  0.270534

那么您可以使用xs拉出APPL:

In [13]: res.xs("AAPL", level=1, axis=1)
Out[13]:
                   1         2         3         4
2013-01-01  0.144764 -1.104683  0.492292 -0.121656
2013-01-02 -0.163063  0.317464  0.045253 -0.579194
2013-01-03  0.213592 -0.962551 -0.627107  0.548166
2013-01-04  1.067464  0.925848  0.531653 -0.867982
2013-01-05  0.873720  0.116701 -0.154348  1.351421
2013-01-06 -0.781975  1.480997 -0.785660 -1.555876

也許更好的事情是獲得各組的意見:

In [21]: d = dict(iter(res.groupby(level=1, axis=1)))

In [22]: d["AAPL"]
Out[22]:
                   1         2         3         4
                AAPL      AAPL      AAPL      AAPL
2013-01-01  0.144764 -1.104683  0.492292 -0.121656
2013-01-02 -0.163063  0.317464  0.045253 -0.579194
2013-01-03  0.213592 -0.962551 -0.627107  0.548166
2013-01-04  1.067464  0.925848  0.531653 -0.867982
2013-01-05  0.873720  0.116701 -0.154348  1.351421
2013-01-06 -0.781975  1.480997 -0.785660 -1.555876

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM