简体   繁体   English

Python Pandas - Concat两个具有不同行数和列数的数据帧

[英]Python Pandas - Concat two data frames with different number of rows and columns

I have two data frames with different row numbers and columns. 我有两个具有不同行号和列的数据帧。 Both tables has few common columns including "Customer ID". 两个表都有几个常见的列,包括“客户ID”。 Both tables look like this with a size of 11697 rows × 15 columns and 385839 rows × 6 columns respectively. 两个表都看起来像这样,大小分别为11697行×15列和385839行×6列。 Customer ID might be repeating in second table. 客户ID可能在第二个表中重复。 I want to concat both of the tables and want to merge similar columns using Customer ID. 我想连接两个表,并希望使用客户ID合并类似的列。 How can I do that with python PANDAS. 我怎么能用python PANDAS做到这一点。 One table looks like this - 一张桌子看起来像这样 -

在此输入图像描述

and the other one looks like this - 而另一个看起来像这样 - 在此输入图像描述

I am using below code - 我使用下面的代码 -

 pd.concat([df1, df2], sort=False)

Just wanted to make sure that I am not losing any information ? 只是想确保我没有丢失任何信息? How can I check if there are multiple entries with one ID and how can I combine it in one result ? 如何检查是否有多个带有一个ID的条目,如何将其合并到一个结果中?

EDIT - 编辑 -

When I am using above code, here is before and after values of NA'S in the dataset - 当我使用上面的代码时,这里是数据集中NA'S的值之前和之后 - 在此输入图像描述

Can someone tell, where I went wrong ? 有人能告诉我,哪里出错了?

I believe that DataFrame.merge would work in this case: 我相信DataFrame.merge在这种情况下会起作用:

# use how='outer' to preserve all information from both DataFrames
df1.merge(df2, how='outer', on='customer_id')

DataFrame.join could also work if both DataFrames had their indexes set to customer_id (it is also simpler): 如果两个DataFrames的索引都设置为customer_id那么DataFrame.join也可以工作(它也更简单):

df1 = df1.set_index('customer_id')
df2 = df2.set_index('customer_id')
df1.join(df2, how='outer')

pd.concat will do the trick here,just set axis to 1 to concatenate on the second axis(columns), you should set the index to customer_id for both data frames first pd.concat将在这里做的技巧,只需将axis设置为1以在第二轴(列)上连接,您应该首先为两个数据帧设置索引到customer_id

import pandas as pd
pd.concat([df1.set_index('customer_id'), df2.set_index('customer_id')], axis = 1)

if you want to omit the rows with empty values as a result of your concatenaton, use dropna: 如果您想通过concatenaton省略具有空值的行,请使用dropna:

pd.concat([df1.set_index('customer_id'), df2.set_index('customer_id')], axis = 1).dropna()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 pandas 中连接两个或多个具有不同列名的数据帧 - How to concat two or more data frames with different columns names in pandas 如何在 pandas 中连接两个具有不同列名的数据帧? - python - how to concat two data frames with different column names in pandas? - python Python pandas:通过复制连接两个具有不同行数的DataFrame - Python pandas: concat two DataFrames with different number of rows by duplication 如何比较具有相同列但行数不同的两个数据帧? - How to compare two data frames with same columns but different number of rows? Python Pandas Concat 具有不同列和相同行的数据帧列表 - Python Pandas Concat list of Dataframes With Different Columns and Same Rows 如何在pandas中连接两个具有不同列数的帧? - How to concatenate two frames with different number of columns in pandas? 使用熊猫连接两个数据帧 - concat two data frames using pandas 比较两个不同的熊猫数据框中的两列值 - compare two columns values in two different pandas data frames 合并两个不同长度的python pandas数据帧,但将所有行保留在输出数据帧中 - Merge two python pandas data frames of different length but keep all rows in output data frame 如何使用python pandas数据帧比较然后连接来自两个不同行的信息 - How to compare and then concatenate information from two different rows using python pandas data frames
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM