简体   繁体   中英

how to combine two pandas dataframe with same format but different length index

I have multiple dataframe. Each dataframe has time index and they are all same format(datetime). Problem is that some dataframe goes from 2000 to 2004 while others go from 2001 to 2004 and so on. I do not know which dataframe has the longest period of time. For example,

df1
             companyA
2000-01-01   10    
2000-02-01   13
2000-03-01   21
2000-04-01   11
2000-05-01   9
2000-06-01   18
      .
      .
      .
2017-09-01   3
2017-10-01   14
2017-11-01   20
2017-12-01   5

df2
             companyB
2004-01-01   19    
2004-02-01   32
2004-03-01   17
2004-04-01   42
2004-05-01   29
2004-06-01   31
      .
      .
      .
2017-09-01   43
2017-10-01   54
2017-11-01   30
2017-12-01   45

I want to make this into

df1
             companyA    companyB    companyC...
2000-01-01   10          0           0
2000-02-01   13          0           0
2000-03-01   21          0           0
2000-04-01   11          0           0
2000-05-01   9           0           0
2000-06-01   18          0           0
      .
      .
      .
2004-01-01   19          19           0
2004-02-01   12          32           0
2004-03-01   17          17           0
2004-04-01   12          42           0
2004-05-01   19          29           0
2004-06-01   11          31           0
      .
      .
      .
2017-09-01   3           43           15
2017-10-01   14          34           24
2017-11-01   20          50           14
2017-12-01   5           45           21

I've tried

df = pd.concat([df1, df2, df3, .....], axis = 1)

but it just stacked and ignored the index. I also tried merge but it didnt work either.

EDIT:

pd.merge(df1,df2,left_index=True,right_index=True,how='outer').fillna(0)

this did exactly what I wanted to do, however, is there a way to merge more than two dataframes? If I had 100 companies, I do not want to repeat this for 100 times.

Is this what you are after?

pd.concat([df1,df2]).fillna(0)

or:

pd.merge(df1,df2,left_index=True,right_index=True,how='outer').fillna(0)
Out[9]: 
            companyA  companyB
2000-01-01      10.0       0.0
2000-02-01      13.0       0.0
2000-03-01      21.0       0.0
2000-04-01      11.0       0.0
2000-05-01       9.0       0.0
2000-06-01      18.0       0.0
2004-01-01       0.0      19.0
2004-02-01       0.0      32.0
2004-03-01       0.0      17.0
2004-04-01       0.0      42.0
2004-05-01       0.0      29.0
2004-06-01       0.0      31.0
2017-09-01       3.0      43.0
2017-10-01      14.0      54.0
2017-11-01      20.0      30.0
2017-12-01       5.0      45.0

You can also use .join for this purpose

df1.join(df2, how='outer).join(df3, how='outer')

.join(dataFrame, how='outer')

will join the dataframes such that the index is the union of the index of all the dataframes used.

I had the same problem with lots of DataFrames that I wanted to combine. A recursive function solved it for me.

from random import randint
import numpy as np
import pandas as pd

def rand_dataframe(x):
    rnd = randint(2,10)
    return pd.DataFrame(np.random.rand(rnd), index = range(rnd))


def rec_merge(data, merged = None):
    if len(data) == 0:
        return merged
    if type(merged) == type(None):
        return rec_merge(data[1:], data[0])
    return rec_merge(data[1:], pd.merge(merged, data[0], left_index=True, right_index=True, how='outer').fillna(0))


dummy = map(rand_dataframe, range(randint(2,10)))
rec_merge(dummy)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM