简体   繁体   中英

Performing union on three Group by Resultant dataframes with same columns, different order

I have created three different pandas dataframes by Applying Group By on three different Data having Columns A,B,C using.

Resultdf=SessionDev.query(AppDetails).filter(text(" A in ('20170727L00319')")).all()

df1= Resultdf.groupby(["A", "B","C"]).size().reset_index(name='Count')

[df1]

    A              |      B           | C  |Count

0 | 20170727L00319  |      423605030008907  |   319     |   1

1 | 20170727L00319   |     42360604002461     | 319   |   1

[df2]

   A               |     B            |  C  |  Count

0 | 20170727L00319   |   423605030008907   |  319   |   2

1 | 20170727L00319   |   423606040002461   |  319    |  2

[df3]

    A              |     B            |  C  | Count

0 | 20170727L00319   |   423605030008907   |  319  |    1

1 | 20170727L00319   |   423606040002461   |  319  |    2

I want to perform an union(Excluding Duplicate) on the above three Grouped dataframes Result into Single dataframes having Distinct Result

I have tried to concat this three different dataframe and then removing duplicate using drop_duplicates but i am unable find any result

A                  |    B             | C

0 | 20170727L00319  |  423605030008907  |  319

1 | 20170727L00319  |  423606040002461  |  319

2 | 20170727L00319  |  423605030008907  |  319

3 | 20170727L00319  |  42360604002461   |  319

5 | 20170727L00319  |  423606040002461  |  319

Using

FinalUnion=pd.concat([df1,df2,df3],ignore_index=True,join_axes=[df1.drop(['Count'],axis=1)

FinalUnion.drop_duplicates(['B','C'], keep='first')

I am Expecting Result as Below

         A             |    B             |   C

0 | 20170727L00319  |  423605030008907  |  319

1 | 20170727L00319  |  423606040002461  |  319

3 | 20170727L00319  |  42360604002461     |  319

Update:

After performing drop_duplicates on Column A and B,i have got distinct result.But performing drop_duplicates on any other combination seems to fail.

The issue was simple,as i have used data from three different tables into three different model and then into three different pd dataframe. And then Perform Group by and then Concat and Drop Duplicate to get Distinct result.

Resolution : Column [C] for the First two tables where having datatype varchar, where as for the third table it was big-int,cos of which the drop_duplicate failed to provide appropriate result

Changing the datatype gave the exact result. Another way to dynamically convert datatype is using df1[["C"]] = df1[["C"]].apply(pd.to_numeric)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM