I've been on this all night, and just can't figure it out, even though I know it should be simple. So, my sincerest apologies for the following incantation from a sleep-deprived fellow:
So, I have four fields, Employee ID, Name, Station and Shift (ID is non-null integer, the rest are strings or null).
I have about 10 dataframes, all indexed by ID. And each containing only two columns either (Name and Station) or (Name and Shift)
Now of course, I want to combine all of this into one dataframe, which has a unique row for each ID.
But I'm really frustrated by it at this point(especially because I can't find a way to directly check how many unique indices my final dataframe ends with)
After messing around with some very ugly ways of using .merge(), I finally found .concat(). But it keeps making multiple rows per ID, when I check in excel, the indices are like Table1/1234, Table2/1234 etc. One row has the shift, the other one has station, which is precisely what I'm trying to avoid.
How do I compile all my data into one dataframe, having exactly one row per ID? Possibly without using 9 different merge statements, as I have to scale up later.
If I understand your question correctly, this is the thing that you want.
For example with this 3 dataframes..
In [1]: df1
Out[1]:
0 1 2
0 3.588843 3.566220 6.518865
1 7.585399 4.269357 4.781765
2 9.242681 7.228869 5.680521
3 3.600121 3.931781 4.616634
4 9.830029 9.177663 9.842953
5 2.738782 3.767870 0.925619
6 0.084544 6.677092 1.983105
7 5.229042 4.729659 8.638492
8 8.575547 6.453765 6.055660
9 4.386650 5.547295 8.475186
In [2]: df2
Out[2]:
0 1
0 95.013170 90.382886
2 1.317641 29.600709
4 89.908139 21.391058
6 31.233153 3.902560
8 17.186079 94.768480
In [3]: df
Out[3]:
0 1 2
0 0.777689 0.357484 0.753773
1 0.271929 0.571058 0.229887
2 0.417618 0.310950 0.450400
3 0.682350 0.364849 0.933218
4 0.738438 0.086243 0.397642
5 0.237481 0.051303 0.083431
6 0.543061 0.644624 0.288698
7 0.118142 0.536156 0.098139
8 0.892830 0.080694 0.084702
9 0.073194 0.462129 0.015707
You can do
pd.concat([df,df1,df2], axis=1)
This produces
In [6]: pd.concat([df,df1,df2], axis=1)
Out[6]:
0 1 2 0 1 2 0 1
0 0.777689 0.357484 0.753773 3.588843 3.566220 6.518865 95.013170 90.382886
1 0.271929 0.571058 0.229887 7.585399 4.269357 4.781765 NaN NaN
2 0.417618 0.310950 0.450400 9.242681 7.228869 5.680521 1.317641 29.600709
3 0.682350 0.364849 0.933218 3.600121 3.931781 4.616634 NaN NaN
4 0.738438 0.086243 0.397642 9.830029 9.177663 9.842953 89.908139 21.391058
5 0.237481 0.051303 0.083431 2.738782 3.767870 0.925619 NaN NaN
6 0.543061 0.644624 0.288698 0.084544 6.677092 1.983105 31.233153 3.902560
7 0.118142 0.536156 0.098139 5.229042 4.729659 8.638492 NaN NaN
8 0.892830 0.080694 0.084702 8.575547 6.453765 6.055660 17.186079 94.768480
9 0.073194 0.462129 0.015707 4.386650 5.547295 8.475186 NaN NaN
For more details you might want to see pd.concat
Just a tip putting simple illustrative data in your question always helps in getting answer.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.