简体   繁体   English

从另一个具有不同索引的 dataframe 在 pandas dataframe 添加新列

[英]Adding a new column in pandas dataframe from another dataframe with differing indices

This is my original dataframe.这是我原来的 dataframe。

df1

This is my second dataframe containing one column.这是我的第二个 dataframe,其中包含一列。

DF2

I want to add the column of second dataframe to the original dataframe at the end.我想在最后添加第二个dataframe的列到原来的dataframe。 Indices are different for both dataframes.两个数据帧的索引不同。 I did like this.我确实喜欢这个。

df1['RESULT'] = df2['RESULT']

It doesn't return an error and the column is added but all values are NaNs.它不会返回错误并添加该列,但所有值都是 NaN。 How do I add these columns with their values?如何添加这些列及其值?

Assuming the size of your dataframes are the same, you can assign the RESULT_df['RESULT'].values to your original dataframe.假设您的数据帧的大小相同,您可以将RESULT_df['RESULT'].values分配给您的原始数据帧。 This way, you don't have to worry about indexing issues.这样,您就不必担心索引问题。

# pre 0.24
feature_file_df['RESULT'] = RESULT_df['RESULT'].values
# >= 0.24
feature_file_df['RESULT'] = RESULT_df['RESULT'].to_numpy()

Minimal Code Sample最少的代码示例

df
          A         B
0 -1.202564  2.786483
1  0.180380  0.259736
2 -0.295206  1.175316
3  1.683482  0.927719
4 -0.199904  1.077655

df2

           C
11 -0.140670
12  1.496007
13  0.263425
14 -0.557958
15 -0.018375

Let's try direct assignment first.让我们先尝试直接赋值。

df['C'] = df2['C']
df

          A         B   C
0 -1.202564  2.786483 NaN
1  0.180380  0.259736 NaN
2 -0.295206  1.175316 NaN
3  1.683482  0.927719 NaN
4 -0.199904  1.077655 NaN

Now, assign the array returned by .values (or .to_numpy() for pandas versions >0.24).现在,分配由.values返回的数组(或.to_numpy()对于 >0.24 版本的.to_numpy() )。 .values returns a numpy array which does not have an index. .values返回一个没有索引的numpy数组。

df2['C'].values 
array([-0.141,  1.496,  0.263, -0.558, -0.018])

df['C'] = df2['C'].values
df

          A         B         C
0 -1.202564  2.786483 -0.140670
1  0.180380  0.259736  1.496007
2 -0.295206  1.175316  0.263425
3  1.683482  0.927719 -0.557958
4 -0.199904  1.077655 -0.018375

You can also call set_axis() to change the index of a dataframe/column.您还可以调用set_axis()来更改数据框/列的索引。 So if the lengths are the same, then with set_axis() , you can coerce the index of one dataframe to be the same as the other dataframe.因此,如果长度相同,则使用set_axis() ,您可以强制一个 dataframe 的索引与另一个 dataframe 的索引相同。

df1['A'] = df2['A'].set_axis(df1.index)

If you get SettingWithCopyWarning , then to silence it, you can create a copy by either calling join() or assign() .如果您收到SettingWithCopyWarning ,然后要使其静音,您可以通过调用join()assign()创建一个副本。

df1 = df1.join(df2['A'].set_axis(df1.index))
# or
df1 = df1.assign(new_col = df2['A'].set_axis(df1.index))

set_axis() is especially useful if you want to add multiple columns from another dataframe. You can just call join() after calling it on the new dataframe.如果你想从另一个 dataframe添加多个列set_axis()特别有用。你可以在新的 dataframe 上调用它之后再调用join()

df1 = df1.join(df2[['A', 'B', 'C']].set_axis(df1.index))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM