简体   繁体   中英

Merging two dataframes based on index

I've been on this all night, and just can't figure it out, even though I know it should be simple. So, my sincerest apologies for the following incantation from a sleep-deprived fellow:

So, I have four fields, Employee ID, Name, Station and Shift (ID is non-null integer, the rest are strings or null).

I have about 10 dataframes, all indexed by ID. And each containing only two columns either (Name and Station) or (Name and Shift)

Now of course, I want to combine all of this into one dataframe, which has a unique row for each ID.

But I'm really frustrated by it at this point(especially because I can't find a way to directly check how many unique indices my final dataframe ends with)

After messing around with some very ugly ways of using .merge(), I finally found .concat(). But it keeps making multiple rows per ID, when I check in excel, the indices are like Table1/1234, Table2/1234 etc. One row has the shift, the other one has station, which is precisely what I'm trying to avoid.

How do I compile all my data into one dataframe, having exactly one row per ID? Possibly without using 9 different merge statements, as I have to scale up later.

If I understand your question correctly, this is the thing that you want.

For example with this 3 dataframes..

In [1]: df1
Out[1]:
          0         1         2
0  3.588843  3.566220  6.518865
1  7.585399  4.269357  4.781765
2  9.242681  7.228869  5.680521
3  3.600121  3.931781  4.616634
4  9.830029  9.177663  9.842953
5  2.738782  3.767870  0.925619
6  0.084544  6.677092  1.983105
7  5.229042  4.729659  8.638492
8  8.575547  6.453765  6.055660
9  4.386650  5.547295  8.475186

In [2]: df2
Out[2]:
           0          1
0  95.013170  90.382886
2   1.317641  29.600709
4  89.908139  21.391058
6  31.233153   3.902560
8  17.186079  94.768480

In [3]: df
Out[3]:
          0         1         2
0  0.777689  0.357484  0.753773
1  0.271929  0.571058  0.229887
2  0.417618  0.310950  0.450400
3  0.682350  0.364849  0.933218
4  0.738438  0.086243  0.397642
5  0.237481  0.051303  0.083431
6  0.543061  0.644624  0.288698
7  0.118142  0.536156  0.098139
8  0.892830  0.080694  0.084702
9  0.073194  0.462129  0.015707

You can do

pd.concat([df,df1,df2], axis=1)

This produces

In [6]: pd.concat([df,df1,df2], axis=1)
Out[6]:
          0         1         2         0         1         2          0          1
0  0.777689  0.357484  0.753773  3.588843  3.566220  6.518865  95.013170  90.382886
1  0.271929  0.571058  0.229887  7.585399  4.269357  4.781765        NaN        NaN
2  0.417618  0.310950  0.450400  9.242681  7.228869  5.680521   1.317641  29.600709
3  0.682350  0.364849  0.933218  3.600121  3.931781  4.616634        NaN        NaN
4  0.738438  0.086243  0.397642  9.830029  9.177663  9.842953  89.908139  21.391058
5  0.237481  0.051303  0.083431  2.738782  3.767870  0.925619        NaN        NaN
6  0.543061  0.644624  0.288698  0.084544  6.677092  1.983105  31.233153   3.902560
7  0.118142  0.536156  0.098139  5.229042  4.729659  8.638492        NaN        NaN
8  0.892830  0.080694  0.084702  8.575547  6.453765  6.055660  17.186079  94.768480
9  0.073194  0.462129  0.015707  4.386650  5.547295  8.475186        NaN        NaN

For more details you might want to see pd.concat

Just a tip putting simple illustrative data in your question always helps in getting answer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM