简体   繁体   English

将多个列插入另一个DataFrame

[英]Inserting several columns into another DataFrame

Say I have the following DataFrame df1: 假设我有以下DataFrame df1:

name    course   yob     city
paul    A        1995    london
john    A        2005    berlin
stacy   B        2015    vienna
mark    D        2013    madrid

And also the following DataFrame df2: 还有以下DataFrame df2:

name    height   occupation   
paul    185      student    
mark    162      pilot

I want to combine them to obtain: 我想将它们结合起来获得:

name    course   height   occupation   yob     city
paul    A        185      student      1995    london
john    A        NaN      NaN          2005    berlin
stacy   B        NaN      NaN          2015    vienna
mark    D        162      pilot        2013    madrid

So the idea is I have df1, which is my main data structure, and I want to insert the columns of df2 (which only has information regarding some of the names) in a specific location in df1 (namely in this case between the columns course and yob). 所以这个想法是我有df1,这是我的主要数据结构,我想在df1中的特定位置插入df2的列(其中只有关于某些名称的信息)(即在这种情况下,在列之间和yob)。 The ordering of the columns is important, and shouldn't be changed. 列的排序很重要,不应更改。

What would be the most straightforward/elegant way of doing this? 这样做最直接/最优雅的方式是什么?

Its not clear you want left or outer join. 你不清楚你想要左或外连接。 here is simple way for left join 这是左连接的简单方法

I am using first dataframe as df1 and second dataframe as df2 for result 我使用第一个数据帧作为df1,第二个数据帧作为df2用于结果

import pandas as pd

df_result = pd.merge (left=df1, right=df2, how='left', on='name')
# Reorder the columns
df_result = df_result[["name", "course", "height", "occupation", "yob", "city"]]

print(df_result)

If you want outer join 如果你想要外连接

df_result = pd.merge (left=df1, right=df2, how='outer', on='name')

A generalized approach will be merge and then create a list with df2.columns excluding the matching columns in the middle of the list df1.columns and reindex() : 一般化方法将合并,然后使用df2.columns创建一个列表,排除列表df1.columnsreindex()中间的匹配列:

final=df1.merge(df2,on='name',how='left')
l=list(df1.columns)
s=l[:len(l)//2]+list(df2.columns.difference(df1.columns))+l[len(l)//2:]
#['name', 'course', 'height', 'occupation', 'yob', 'city']

Then use reindex() on axis=1 然后在axis=1上使用reindex()

final=final.reindex(s,axis=1)
print(final)

    name course  height occupation   yob    city
0   paul      A   185.0    student  1995  london
1   john      A     NaN        NaN  2005  berlin
2  stacy      B     NaN        NaN  2015  vienna
3   mark      D   162.0      pilot  2013  madrid

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM