I have some data frames like this
num a -- num b -- num c -- num d
101 0 101 1 102 0 101 1
102 1 103 1 103 0 102 0
103 0 104 0 104 1 103 1
104 0 105 0 105 1 104 1
105 1 107 1 106 1 106 0
106 1 108 1 107 1 107 0
I have them in an array called frames. I want to do something like pd.concat(frames) and have as a result
num a b c d
101 0 1 Nan 1
102 1 Nan 0 0
103 0 1 0 1
104 0 0 1 1
105 1 0 1 Nan
106 1 Nan 1 0
107 Nan 1 1 0
108 Nan 1 Nan Nan
but I think I should use pd.merge to set num as the join on column. Using merge I think I can only merge 2 data frames, should I use it in a loop to merge all my data frames? or can I do this with concat or is there another (and better) way?
UPDATE:
dfs = []
data = """\
num a
101 0
102 1
103 0
104 0
105 1
106 1
"""
dfs.append(pd.read_csv(io.StringIO(data), delim_whitespace=True))
data = """\
num b
101 1
103 1
104 0
105 0
107 1
108 1
"""
dfs.append(pd.read_csv(io.StringIO(data), delim_whitespace=True))
data = """\
num c
102 0
103 0
104 1
105 1
106 1
107 1
"""
dfs.append(pd.read_csv(io.StringIO(data), delim_whitespace=True))
data = """\
num d
101 1
102 0
103 1
104 1
106 0
107 0
"""
dfs.append(pd.read_csv(io.StringIO(data), delim_whitespace=True))
let's set num
as index:
for i in range(len(dfs)):
dfs[i].set_index('num', inplace=True)
df = pd.concat(dfs, axis=1)
yields:
In [116]: df
Out[116]:
a b c d
num
101 0.0 1.0 NaN 1.0
102 1.0 NaN 0.0 0.0
103 0.0 1.0 0.0 1.0
104 0.0 0.0 1.0 1.0
105 1.0 0.0 1.0 NaN
106 1.0 NaN 1.0 0.0
107 NaN 1.0 1.0 0.0
108 NaN 1.0 NaN NaN
OLD answer:
try pd.concat(..., axis=1 ):
pd.concat(frames, axis=1)
it'll concatenate your frames horizontally by index , so you may want to set appropriate index beforehand
Apart from pd.concat
, you can also use pd.merge
.
import pandas as pd
import io
a = pd.read_csv(
io.StringIO(
"num,a\n101,0\n102,1\n103,0\n104,0\n105,1\n106,1\n"
),
header = 0
)
b = pd.read_csv(
io.StringIO(
"num,b\n101,1\n103,1\n104,0\n105,0\n107,1\n108,1\n"
),
header = 0
)
c = pd.read_csv(
io.StringIO(
"num,c\n102,0\n103,0\n104,1\n105,1\n106,1\n107,1\n"
),
header = 0
)
d = pd.read_csv(
io.StringIO(
"num,d\n101,1\n102,0\n103,1\n104,1\n106,0\n107,0\n"
),
header = 0
)
mylist = [a, b, c, d]
for i in range(4):
if i == 0:
result = mylist[i]
else:
result = pd.merge(
result,
mylist[i],
how = 'outer',
on = 'num'
)
And then you will get the result.
In [14]: result
Out[14]:
num a b c d
0 101 0.0 1.0 NaN 1.0
1 102 1.0 NaN 0.0 0.0
2 103 0.0 1.0 0.0 1.0
3 104 0.0 0.0 1.0 1.0
4 105 1.0 0.0 1.0 NaN
5 106 1.0 NaN 1.0 0.0
6 107 NaN 1.0 1.0 0.0
7 108 NaN 1.0 NaN NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.