简体   繁体   English

如何并排合并两个数据框?

[英]How to merge two dataframes side-by-side?

is there a way to conveniently merge two data frames side by side?有没有办法方便地并排合并两个数据框?

both two data frames have 30 rows, they have different number of columns, say, df1 has 20 columns and df2 has 40 columns.两个数据框都有 30 行,它们有不同的列数,例如,df1 有 20 列,df2 有 40 列。

how can i easily get a new data frame of 30 rows and 60 columns?如何轻松获得 30 行 60 列的新数据框?

df3 = pd.someSpecialMergeFunct(df1, df2)

or maybe there is some special parameter in append或者可能有一些特殊的参数附加

df3 = pd.append(df1, df2, left_index=False, right_index=false, how='left')

ps: if possible, i hope the replicated column names could be resolved automatically. ps:如果可能的话,我希望复制的列名可以自动解析。

thanks!谢谢!

You can use the concat function for this ( axis=1 is to concatenate as columns):您可以为此使用concat函数( axis=1是连接为列):

pd.concat([df1, df2], axis=1)

See the pandas docs on merging/concatenating: http://pandas.pydata.org/pandas-docs/stable/merging.html请参阅有关合并/连接的 pandas 文档:http: //pandas.pydata.org/pandas-docs/stable/merging.html

I came across your question while I was trying to achieve something like the following:我在尝试实现以下目标时遇到了您的问题:

横向合并数据框

So once I sliced my dataframes, I first ensured that their index are the same.因此,一旦我对数据帧进行切片,我首先要确保它们的索引是相同的。 In your case both dataframes needs to be indexed from 0 to 29. Then merged both dataframes by the index.在您的情况下,两个数据帧都需要从 0 到 29 进行索引。然后通过索引合并两个数据帧。

df1.reset_index(drop=True).merge(df2.reset_index(drop=True), left_index=True, right_index=True)

如果要将 2 个数据框与公共列名组合在一起,可以执行以下操作:

df_concat = pd.merge(df1, df2, on='common_column_name', how='outer')

I found that the other answers didn't cut it for me when coming in from Google.当我从谷歌进来时,我发现其他答案并没有为我解决问题。

What I did instead was to set the new columns in place in the original df.我所做的是将新列设置在原始 df 中。

# list(df2.columns) gives you the column names of df2
# you then use these as the column names for df

df[ list(df2.columns) ] = df2
  • There is way, you can do it via a Pipeline.有办法,你可以通过管道来做到这一点。

** Use a pipeline to transform your numerical Data for ex- ** 使用管道将您的数字数据转换为 ex-

Num_pipeline = Pipeline
([("select_numeric", DataFrameSelector([columns with numerical value])),
("imputer", SimpleImputer(strategy="median")),
])

**And for categorical data **对于分类数据

cat_pipeline = Pipeline([
    ("select_cat", DataFrameSelector([columns with categorical data])),
    ("cat_encoder", OneHotEncoder(sparse=False)),
])

** Then use a Feature union to add these transformations together ** 然后使用 Feature union 将这些转换加在一起

preprocess_pipeline = FeatureUnion(transformer_list=[
    ("num_pipeline", num_pipeline),
    ("cat_pipeline", cat_pipeline),
])

如果df1df2具有不同的索引,此解决方案也适用:

df1.loc[:, df2.columns] = df2.to_numpy()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM