简体   繁体   English

对具有相同列,不同顺序的三个按组分组结果数据帧执行联合

[英]Performing union on three Group by Resultant dataframes with same columns, different order

I have created three different pandas dataframes by Applying Group By on three different Data having Columns A,B,C using. 我通过在使用A,B,C列的三个不同数据上应用Group By创建了三个不同的熊猫数据框。

Resultdf=SessionDev.query(AppDetails).filter(text(" A in ('20170727L00319')")).all()

df1= Resultdf.groupby(["A", "B","C"]).size().reset_index(name='Count')

[df1] [DF1]

    A              |      B           | C  |Count

0 | 20170727L00319  |      423605030008907  |   319     |   1

1 | 20170727L00319   |     42360604002461     | 319   |   1

[df2] [DF2]

   A               |     B            |  C  |  Count

0 | 20170727L00319   |   423605030008907   |  319   |   2

1 | 20170727L00319   |   423606040002461   |  319    |  2

[df3] [DF3]

    A              |     B            |  C  | Count

0 | 20170727L00319   |   423605030008907   |  319  |    1

1 | 20170727L00319   |   423606040002461   |  319  |    2

I want to perform an union(Excluding Duplicate) on the above three Grouped dataframes Result into Single dataframes having Distinct Result 我想对上述三个分组数据框结果执行联合(不包括重复),将结果合并为具有不同结果的单个数据框

I have tried to concat this three different dataframe and then removing duplicate using drop_duplicates but i am unable find any result 我试图连接这三个不同的数据框,然后使用drop_duplicates删除重复的数据,但是我找不到任何结果

A                  |    B             | C

0 | 20170727L00319  |  423605030008907  |  319

1 | 20170727L00319  |  423606040002461  |  319

2 | 20170727L00319  |  423605030008907  |  319

3 | 20170727L00319  |  42360604002461   |  319

5 | 20170727L00319  |  423606040002461  |  319

Using 运用

FinalUnion=pd.concat([df1,df2,df3],ignore_index=True,join_axes=[df1.drop(['Count'],axis=1)

FinalUnion.drop_duplicates(['B','C'], keep='first')

I am Expecting Result as Below 我期望的结果如下

         A             |    B             |   C

0 | 20170727L00319  |  423605030008907  |  319

1 | 20170727L00319  |  423606040002461  |  319

3 | 20170727L00319  |  42360604002461     |  319

Update: 更新:

After performing drop_duplicates on Column A and B,i have got distinct result.But performing drop_duplicates on any other combination seems to fail. 在对A和B列执行drop_duplicates之后,我得到了不同的结果。但是对任何其他组合执行drop_duplicates似乎失败了。

The issue was simple,as i have used data from three different tables into three different model and then into three different pd dataframe. 问题很简单,因为我将来自三个不同表的数据用于三个不同的模型,然后用于三个不同的pd数据帧。 And then Perform Group by and then Concat and Drop Duplicate to get Distinct result. 然后执行“分组依据”,然后执行“ Concat”和“ Drop Duplicate”以得到不同的结果。

Resolution : Column [C] for the First two tables where having datatype varchar, where as for the third table it was big-int,cos of which the drop_duplicate failed to provide appropriate result 解决方案:前两个表的列[C]的数据类型为varchar,第三个表的数据类型为big-int,因为drop_duplicate无法提供适当的结果,这是因为

Changing the datatype gave the exact result. 更改数据类型可以得到准确的结果。 Another way to dynamically convert datatype is using df1[["C"]] = df1[["C"]].apply(pd.to_numeric) 动态转换数据类型的另一种方法是使用df1 [[[“ C”]] = df1 [[“ C”]]。apply(pd.to_numeric)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 附加两个具有相同列,不同顺序的数据帧 - Appending two dataframes with same columns, different order PySpark:具有不同列的 DataFrame 的动态联合 - PySpark: dynamic union of DataFrames with different columns 如何合并具有相同列、包含字符串、浮点数等的不同行的三个数据框? - How to merge three dataframes with same columns, different rows containing strings, float, etc? 使用三个查询执行联合 - SQLAlchemy - Performing union with three queries - SQLAlchemy 如何在 spark 中对具有不同列数的两个 DataFrame 执行联合? - How to perform union on two DataFrames with different amounts of columns in spark? 比较具有相同列和不同行的两个数据框 - Comparing two dataframes with same columns and different rows 如何使用公共密钥对来自三个不同数据帧的列进行求和 - How to sum columns from three different dataframes with a common key 两个数据帧的索引和列的联合 - union of two dataframes' index and columns 从熊猫的不同数据框中的同一文件中读取三个表 - Reading three tables from same file in different Dataframes in pandas 如何根据熊猫中的不同条件和列将2个数据框分组 - How to group 2 dataframes based on different conditions and columns in pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM