Merge of more than 2 python pandas data frames

Question

I have some data frames like this

num  a    --  num  b    --  num  c    --   num  d
101  0        101  1        102  0         101  1
102  1        103  1        103  0         102  0
103  0        104  0        104  1         103  1
104  0        105  0        105  1         104  1
105  1        107  1        106  1         106  0
106  1        108  1        107  1         107  0

I have them in an array called frames. I want to do something like pd.concat(frames) and have as a result

num   a   b   c   d
101   0   1  Nan  1
102   1  Nan  0   0
103   0   1   0   1
104   0   0   1   1
105   1   0   1  Nan
106   1  Nan  1   0
107  Nan  1   1   0
108  Nan  1  Nan Nan

but I think I should use pd.merge to set num as the join on column. Using merge I think I can only merge 2 data frames, should I use it in a loop to merge all my data frames? or can I do this with concat or is there another (and better) way?

Answer 1

UPDATE:

dfs = []

data = """\
num  a
101  0
102  1
103  0
104  0
105  1
106  1
"""
dfs.append(pd.read_csv(io.StringIO(data), delim_whitespace=True))

data = """\
num  b
101  1
103  1
104  0
105  0
107  1
108  1
"""
dfs.append(pd.read_csv(io.StringIO(data), delim_whitespace=True))

data = """\
num  c
102  0
103  0
104  1
105  1
106  1
107  1
"""
dfs.append(pd.read_csv(io.StringIO(data), delim_whitespace=True))

data = """\
num  d
101  1
102  0
103  1
104  1
106  0
107  0
"""
dfs.append(pd.read_csv(io.StringIO(data), delim_whitespace=True))

let's set num as index:

for i in range(len(dfs)):
    dfs[i].set_index('num', inplace=True)


df = pd.concat(dfs, axis=1)

yields:

In [116]: df
Out[116]:
       a    b    c    d
num
101  0.0  1.0  NaN  1.0
102  1.0  NaN  0.0  0.0
103  0.0  1.0  0.0  1.0
104  0.0  0.0  1.0  1.0
105  1.0  0.0  1.0  NaN
106  1.0  NaN  1.0  0.0
107  NaN  1.0  1.0  0.0
108  NaN  1.0  NaN  NaN

OLD answer:

try pd.concat(..., axis=1 ):

pd.concat(frames, axis=1)

it'll concatenate your frames horizontally by index , so you may want to set appropriate index beforehand

Answer 2

Apart from pd.concat , you can also use pd.merge .

import pandas as pd
import io
a = pd.read_csv(
    io.StringIO(
        "num,a\n101,0\n102,1\n103,0\n104,0\n105,1\n106,1\n"
    ),
    header = 0
)

b = pd.read_csv(
    io.StringIO(
        "num,b\n101,1\n103,1\n104,0\n105,0\n107,1\n108,1\n"
    ),
    header = 0
)

c = pd.read_csv(
    io.StringIO(
        "num,c\n102,0\n103,0\n104,1\n105,1\n106,1\n107,1\n"
    ),
    header = 0
)

d = pd.read_csv(
    io.StringIO(
        "num,d\n101,1\n102,0\n103,1\n104,1\n106,0\n107,0\n"
    ),
    header = 0
)

mylist = [a, b, c, d]

for i in range(4):
    if i == 0:
        result = mylist[i]
    else:
        result = pd.merge(
            result,
            mylist[i],
            how = 'outer',
            on = 'num'
        )

And then you will get the result.

In [14]: result
Out[14]: 

   num    a    b    c    d
0  101  0.0  1.0  NaN  1.0
1  102  1.0  NaN  0.0  0.0
2  103  0.0  1.0  0.0  1.0
3  104  0.0  0.0  1.0  1.0
4  105  1.0  0.0  1.0  NaN
5  106  1.0  NaN  1.0  0.0
6  107  NaN  1.0  1.0  0.0
7  108  NaN  1.0  NaN  NaN

Merge of more than 2 python pandas data frames

Question

2 answers

solution1
1 ACCPTED 2016-04-26 21:22:18

solution2
1 2017-02-09 11:14:08

Merge of more than 2 python pandas data frames

Question

2 answers

solution1 1 ACCPTED 2016-04-26 21:22:18

solution2 1 2017-02-09 11:14:08

solution1
1 ACCPTED 2016-04-26 21:22:18

solution2
1 2017-02-09 11:14:08