简体   繁体   中英

Merge of more than 2 python pandas data frames

I have some data frames like this

num  a    --  num  b    --  num  c    --   num  d
101  0        101  1        102  0         101  1
102  1        103  1        103  0         102  0
103  0        104  0        104  1         103  1
104  0        105  0        105  1         104  1
105  1        107  1        106  1         106  0
106  1        108  1        107  1         107  0

I have them in an array called frames. I want to do something like pd.concat(frames) and have as a result

num   a   b   c   d
101   0   1  Nan  1
102   1  Nan  0   0
103   0   1   0   1
104   0   0   1   1
105   1   0   1  Nan
106   1  Nan  1   0
107  Nan  1   1   0
108  Nan  1  Nan Nan

but I think I should use pd.merge to set num as the join on column. Using merge I think I can only merge 2 data frames, should I use it in a loop to merge all my data frames? or can I do this with concat or is there another (and better) way?

UPDATE:

dfs = []

data = """\
num  a
101  0
102  1
103  0
104  0
105  1
106  1
"""
dfs.append(pd.read_csv(io.StringIO(data), delim_whitespace=True))

data = """\
num  b
101  1
103  1
104  0
105  0
107  1
108  1
"""
dfs.append(pd.read_csv(io.StringIO(data), delim_whitespace=True))

data = """\
num  c
102  0
103  0
104  1
105  1
106  1
107  1
"""
dfs.append(pd.read_csv(io.StringIO(data), delim_whitespace=True))

data = """\
num  d
101  1
102  0
103  1
104  1
106  0
107  0
"""
dfs.append(pd.read_csv(io.StringIO(data), delim_whitespace=True))

let's set num as index:

for i in range(len(dfs)):
    dfs[i].set_index('num', inplace=True)


df = pd.concat(dfs, axis=1)

yields:

In [116]: df
Out[116]:
       a    b    c    d
num
101  0.0  1.0  NaN  1.0
102  1.0  NaN  0.0  0.0
103  0.0  1.0  0.0  1.0
104  0.0  0.0  1.0  1.0
105  1.0  0.0  1.0  NaN
106  1.0  NaN  1.0  0.0
107  NaN  1.0  1.0  0.0
108  NaN  1.0  NaN  NaN

OLD answer:

try pd.concat(..., axis=1 ):

pd.concat(frames, axis=1)

it'll concatenate your frames horizontally by index , so you may want to set appropriate index beforehand

Apart from pd.concat , you can also use pd.merge .

import pandas as pd
import io
a = pd.read_csv(
    io.StringIO(
        "num,a\n101,0\n102,1\n103,0\n104,0\n105,1\n106,1\n"
    ),
    header = 0
)

b = pd.read_csv(
    io.StringIO(
        "num,b\n101,1\n103,1\n104,0\n105,0\n107,1\n108,1\n"
    ),
    header = 0
)

c = pd.read_csv(
    io.StringIO(
        "num,c\n102,0\n103,0\n104,1\n105,1\n106,1\n107,1\n"
    ),
    header = 0
)

d = pd.read_csv(
    io.StringIO(
        "num,d\n101,1\n102,0\n103,1\n104,1\n106,0\n107,0\n"
    ),
    header = 0
)

mylist = [a, b, c, d]

for i in range(4):
    if i == 0:
        result = mylist[i]
    else:
        result = pd.merge(
            result,
            mylist[i],
            how = 'outer',
            on = 'num'
        )

And then you will get the result.

In [14]: result
Out[14]: 

   num    a    b    c    d
0  101  0.0  1.0  NaN  1.0
1  102  1.0  NaN  0.0  0.0
2  103  0.0  1.0  0.0  1.0
3  104  0.0  0.0  1.0  1.0
4  105  1.0  0.0  1.0  NaN
5  106  1.0  NaN  1.0  0.0
6  107  NaN  1.0  1.0  0.0
7  108  NaN  1.0  NaN  NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM