I have created three different pandas dataframes by Applying Group By on three different Data having Columns A,B,C using.
Resultdf=SessionDev.query(AppDetails).filter(text(" A in ('20170727L00319')")).all()
df1= Resultdf.groupby(["A", "B","C"]).size().reset_index(name='Count')
[df1]
A | B | C |Count
0 | 20170727L00319 | 423605030008907 | 319 | 1
1 | 20170727L00319 | 42360604002461 | 319 | 1
[df2]
A | B | C | Count
0 | 20170727L00319 | 423605030008907 | 319 | 2
1 | 20170727L00319 | 423606040002461 | 319 | 2
[df3]
A | B | C | Count
0 | 20170727L00319 | 423605030008907 | 319 | 1
1 | 20170727L00319 | 423606040002461 | 319 | 2
I want to perform an union(Excluding Duplicate) on the above three Grouped dataframes Result into Single dataframes having Distinct Result
I have tried to concat this three different dataframe and then removing duplicate using drop_duplicates but i am unable find any result
A | B | C
0 | 20170727L00319 | 423605030008907 | 319
1 | 20170727L00319 | 423606040002461 | 319
2 | 20170727L00319 | 423605030008907 | 319
3 | 20170727L00319 | 42360604002461 | 319
5 | 20170727L00319 | 423606040002461 | 319
Using
FinalUnion=pd.concat([df1,df2,df3],ignore_index=True,join_axes=[df1.drop(['Count'],axis=1)
FinalUnion.drop_duplicates(['B','C'], keep='first')
I am Expecting Result as Below
A | B | C
0 | 20170727L00319 | 423605030008907 | 319
1 | 20170727L00319 | 423606040002461 | 319
3 | 20170727L00319 | 42360604002461 | 319
Update:
After performing drop_duplicates on Column A and B,i have got distinct result.But performing drop_duplicates on any other combination seems to fail.
The issue was simple,as i have used data from three different tables into three different model and then into three different pd dataframe. And then Perform Group by and then Concat and Drop Duplicate to get Distinct result.
Resolution : Column [C] for the First two tables where having datatype varchar, where as for the third table it was big-int,cos of which the drop_duplicate failed to provide appropriate result
Changing the datatype gave the exact result. Another way to dynamically convert datatype is using df1[["C"]] = df1[["C"]].apply(pd.to_numeric)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.