[英]Performing union on three Group by Resultant dataframes with same columns, different order
I have created three different pandas dataframes by Applying Group By on three different Data having Columns A,B,C using. 我通过在使用A,B,C列的三个不同数据上应用Group By创建了三个不同的熊猫数据框。
Resultdf=SessionDev.query(AppDetails).filter(text(" A in ('20170727L00319')")).all()
df1= Resultdf.groupby(["A", "B","C"]).size().reset_index(name='Count')
[df1] [DF1]
A | B | C |Count
0 | 20170727L00319 | 423605030008907 | 319 | 1
1 | 20170727L00319 | 42360604002461 | 319 | 1
[df2] [DF2]
A | B | C | Count
0 | 20170727L00319 | 423605030008907 | 319 | 2
1 | 20170727L00319 | 423606040002461 | 319 | 2
[df3] [DF3]
A | B | C | Count
0 | 20170727L00319 | 423605030008907 | 319 | 1
1 | 20170727L00319 | 423606040002461 | 319 | 2
I want to perform an union(Excluding Duplicate) on the above three Grouped dataframes Result into Single dataframes having Distinct Result 我想对上述三个分组数据框结果执行联合(不包括重复),将结果合并为具有不同结果的单个数据框
I have tried to concat this three different dataframe and then removing duplicate using drop_duplicates but i am unable find any result 我试图连接这三个不同的数据框,然后使用drop_duplicates删除重复的数据,但是我找不到任何结果
A | B | C
0 | 20170727L00319 | 423605030008907 | 319
1 | 20170727L00319 | 423606040002461 | 319
2 | 20170727L00319 | 423605030008907 | 319
3 | 20170727L00319 | 42360604002461 | 319
5 | 20170727L00319 | 423606040002461 | 319
Using 运用
FinalUnion=pd.concat([df1,df2,df3],ignore_index=True,join_axes=[df1.drop(['Count'],axis=1)
FinalUnion.drop_duplicates(['B','C'], keep='first')
I am Expecting Result as Below 我期望的结果如下
A | B | C
0 | 20170727L00319 | 423605030008907 | 319
1 | 20170727L00319 | 423606040002461 | 319
3 | 20170727L00319 | 42360604002461 | 319
Update: 更新:
After performing drop_duplicates on Column A and B,i have got distinct result.But performing drop_duplicates on any other combination seems to fail. 在对A和B列执行drop_duplicates之后,我得到了不同的结果。但是对任何其他组合执行drop_duplicates似乎失败了。
The issue was simple,as i have used data from three different tables into three different model and then into three different pd dataframe. 问题很简单,因为我将来自三个不同表的数据用于三个不同的模型,然后用于三个不同的pd数据帧。 And then Perform Group by and then Concat and Drop Duplicate to get Distinct result.
然后执行“分组依据”,然后执行“ Concat”和“ Drop Duplicate”以得到不同的结果。
Resolution : Column [C] for the First two tables where having datatype varchar, where as for the third table it was big-int,cos of which the drop_duplicate failed to provide appropriate result 解决方案:前两个表的列[C]的数据类型为varchar,第三个表的数据类型为big-int,因为drop_duplicate无法提供适当的结果,这是因为
Changing the datatype gave the exact result. 更改数据类型可以得到准确的结果。 Another way to dynamically convert datatype is using df1[["C"]] = df1[["C"]].apply(pd.to_numeric)
动态转换数据类型的另一种方法是使用df1 [[[“ C”]] = df1 [[“ C”]]。apply(pd.to_numeric)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.