简体   繁体   中英

Combine multiple columns with different date ranges in Python

I have multiple stock price dataframes with different date ranges (start dates are different) as indices. Below are three examples.

Dataframe #1:
Date
12/15/1980      0.3936
12/16/1980      0.3648
12/17/1980      0.3738
12/18/1980      0.3846
12/19/1980      0.4081
...             ... 
09/21/2018      151.2600

Dataframe #2:
10/26/1993     0.7862
10/28/1993     0.7483
10/29/1993     0.7578
11/01/1993     0.7956
11/02/1993     0.7956
...            ...
09/21/2018     51.2000

Dataframe #3:
Date
10/26/1996      0.7862
10/28/1996      0.7483
10/29/1996      0.7578
11/01/1996      0.7956
11/02/1996      0.7956
...            ...
09/21/2018      36.5032

I would like to combine these dataframes into one table with the date as index. For stocks without data on a specific date, that "cell" would be blank.

I have several hundred of these dataframes. it would be greatly appreciated if someone can help me with this problem!

dflist = [df1, df2, df3 ...]

for df in dflist:
    df.index = pd.to_datetime(df.index,errors ='coerce')

df_all = pd.concat([[df1, df2, df3 ..]],axis=1)

Use concat :

dfs = [df1, df2, df3]
df = pd.concat(dfs, axis=1)
df.index = pd.to_datetime(df.index, format='%m/%d/%Y')
#if need sorted DatetimeIndex
#df = df.sort_index()
print (df)
                   a        b        c
2018-09-21  151.2600  51.2000  36.5032
1993-10-26       NaN   0.7862      NaN
1996-10-26       NaN      NaN   0.7862
1993-10-28       NaN   0.7483      NaN
1996-10-28       NaN      NaN   0.7483
1993-10-29       NaN   0.7578      NaN
1996-10-29       NaN      NaN   0.7578
1993-11-01       NaN   0.7956      NaN
1996-11-01       NaN      NaN   0.7956
1993-11-02       NaN   0.7956      NaN
1996-11-02       NaN      NaN   0.7956
1980-12-15    0.3936      NaN      NaN
1980-12-16    0.3648      NaN      NaN
1980-12-17    0.3738      NaN      NaN
1980-12-18    0.3846      NaN      NaN
1980-12-19    0.4081      NaN      NaN

Another solution is use list comprehension for create DatetimeIndex before concat - then output DatetimeIndex is also sorted:

dfs = [df1, df2, df3]
dfs1 = [x.set_index(pd.to_datetime(x.index, format='%m/%d/%Y')) for x in dfs]
df = pd.concat(dfs1, axis=1)
print (df)
                   a        b        c
1980-12-15    0.3936      NaN      NaN
1980-12-16    0.3648      NaN      NaN
1980-12-17    0.3738      NaN      NaN
1980-12-18    0.3846      NaN      NaN
1980-12-19    0.4081      NaN      NaN
1993-10-26       NaN   0.7862      NaN
1993-10-28       NaN   0.7483      NaN
1993-10-29       NaN   0.7578      NaN
1993-11-01       NaN   0.7956      NaN
1993-11-02       NaN   0.7956      NaN
1996-10-26       NaN      NaN   0.7862
1996-10-28       NaN      NaN   0.7483
1996-10-29       NaN      NaN   0.7578
1996-11-01       NaN      NaN   0.7956
1996-11-02       NaN      NaN   0.7956
2018-09-21  151.2600  51.2000  36.5032

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM