简体   繁体   English

合并不同长度的dataframe

[英]Merge dataframe with different lengths

I am merging two dataframes with different lengths with the following code:我正在使用以下代码合并两个不同长度的数据帧:

df1=pd.merge(df1, df2, on='OFFERING_ID',how='left')

The number of rows before the merge is 400 0000, after the merge the number of row is 600000.合并前的行数为 400 0000,合并后的行数为 600000。

How can you solve that please?请问你怎么解决?

Thanks谢谢

The problem isn't the lengths, it's the OFFERING_ID .问题不在于长度,而在于OFFERING_ID

In short, OFFERING_ID isn't unique in the second dataframe.简而言之, OFFERING_ID在第二个 dataframe 中不是唯一的。 So you get more than one match per OFFERING_ID , and thus more lines than the original.因此,每个OFFERING_ID获得不止一个匹配项,因此比原来的行数更多。

I made an example in repl.it , the code is also pasted below:我在repl.it中做了一个例子,代码也贴在下面:

import pandas as pd

df1 = pd.DataFrame(
    [
        {"OFFERING_ID": 1, "another_field": "whatever"},
        {"OFFERING_ID": 2, "another_field": "whatever"},
        {"OFFERING_ID": 3, "another_field": "whatever"},
        {"OFFERING_ID": 4, "another_field": "whatever"},
    ]
)

df2 = pd.DataFrame(
    [
        {"OFFERING_ID": "1", "another_field": "whatever"},
        {"OFFERING_ID": 1, "another_field": "whatever"},
        {"OFFERING_ID": 1, "another_field": "whatever"},
    ]
)

print(df1.shape)
print(df2.shape)
print(pd.merge(df1, df2, on="OFFERING_ID", how="left").shape)
offering_id_dfs = []
for id in df1.OFFERING_ID.unique():
    sub_df1 = df1.loc[df1.OFFERING_ID == id , :].reset_index(drop=True)
    sub_df2 = df2.loc[df2.OFFERING_ID == id , :].reset_index(drop=True)
    concat_df = pd.concat([sub_df1, sub_df2], axis=1)
    concat_df["OFFERING_ID"] = id
    offering_id_dfs.append(concat_df)
df3 = pd.concat(offering_id_dfs ).reset_index(drop=True)

That might work as long as each DataFrame contains only one column beside your Offering_ID and all df2.Offering_Id.unique() are in the Set of df1.Offering_Id.unique().只要每个 DataFrame 在您的 Offering_ID 旁边仅包含一列并且所有 df2.Offering_Id.unique() 都在 df1.Offering_Id.unique() 的集合中,这可能会起作用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM