从其他 Dataframe 更新多个 Dataframe 列

Question

I have two dataframes:我有两个数据框：

df1

         Date         ID Company     Symbol  Shares  Value
0  2014-09-30   32511907     foo        NaN      10    101
1  2014-09-30   32511957     bar  No Symbol      10    101
2  2014-12-31   32511907    test  No Symbol      18    152
3  2013-06-30   32511107      AA        APC      43    373
4  2014-06-30  166764100      CC        CVX      29    381
5  2014-09-30  166764900       C  No Symbol      13    155
6  2014-09-30  166764950   C Inc  No Symbol      13    155
7  2015-09-30  475643276    Test        NaN      13    155

And df2 :和df2 ：

   Company         ID     Symbol
0   AA Inc   32511907        AAA
1   AA Inc   32511957  No Symbol
2   BB Inc   32511107        APC
3  CC Corp  166764100        CVX
4   CC Inc  166764900  No Symbol
5   CC Inc  166764950  No Symbol

By using ID as the key, I need to replace Symbol and Company in df1 , with those in df2 , so that the output looks like:通过使用ID作为键，我需要将df1中的Symbol和Company替换为df2中的那些，以便 output 看起来像：

         Date         ID Company     Symbol  Shares  Value
0  2014-09-30   32511907  AA Inc        AAA      10    101
1  2014-09-30   32511957  AA Inc  No Symbol      10    101
2  2014-12-31   32511907  AA Inc        AAA      18    152
3  2013-06-30   32511107  BB Inc        APC      43    373
4  2014-06-30  166764100 CC Corp        CVX      29    381
5  2014-09-30  166764900  CC Inc  No Symbol      13    155
6  2014-09-30  166764950  CC Inc  No Symbol      13    155
7  2015-09-30  475643276    Test        NaN      13    155

I have tried:我努力了：

df = df1.set_index('ID')
df.update(df2.set_index('ID'))
df = df1.loc[df1.Symbol == '', ['Company', 'Symbol']] = df.reset_index()

But when performed on the full dataset, I get ValueError: cannot reindex from a duplicate axis because line df.update(df2.set_index('ID')) makes ID the index, which results in a few duplicates.但是当在整个数据集上执行时，我得到ValueError: cannot reindex from a duplicate axis因为行df.update(df2.set_index('ID'))使ID成为索引，这导致了一些重复。

So are there any alternatives to achieving the desired output?那么有没有其他方法可以实现所需的 output？

Answer 1

A pd.merge can solve this.一个 pd.merge 可以解决这个问题。 Assume you only want IDs in df1假设您只需要 df1 中的 ID

result = pd.merge(left=df1[['Date', 'ID', 'Shares', 'Values']], 
                  right=df2, on=['ID'], how='left')

If you need to merge on different labels, you can use left_on and right_on.如果需要在不同的标签上进行合并，可以使用 left_on 和 right_on。

Change the how argument to specify what IDs to keep更改 how 参数以指定要保留的 ID

keep IDs only occurred in df1 ('left')保持 ID 仅出现在 df1 ('left')
keep IDs only occurred in df2 ('right')保持 ID 仅出现在 df2 中（'right'）
keep all IDs occurred in both df1 and df2 ('outer')保留 df1 和 df2 中出现的所有ID（“外部”）
keep only IDs occurred in both df1 and df1 ('inner')只保留 df1 和 df1 ('inner') 中出现的 ID

从其他 Dataframe 更新多个 Dataframe 列

问题描述

1 个解决方案

解决方案1
0 2020-04-10 22:11:36

从其他 Dataframe 更新多个 Dataframe 列

问题描述

1 个解决方案

解决方案1 0 2020-04-10 22:11:36

解决方案1
0 2020-04-10 22:11:36