简体   繁体   English

从其他 Dataframe 更新多个 Dataframe 列

[英]Update Multiple Dataframe Columns From Other Dataframe

I have two dataframes:我有两个数据框:

df1

         Date         ID Company     Symbol  Shares  Value
0  2014-09-30   32511907     foo        NaN      10    101
1  2014-09-30   32511957     bar  No Symbol      10    101
2  2014-12-31   32511907    test  No Symbol      18    152
3  2013-06-30   32511107      AA        APC      43    373
4  2014-06-30  166764100      CC        CVX      29    381
5  2014-09-30  166764900       C  No Symbol      13    155
6  2014-09-30  166764950   C Inc  No Symbol      13    155
7  2015-09-30  475643276    Test        NaN      13    155

And df2 :df2

   Company         ID     Symbol
0   AA Inc   32511907        AAA
1   AA Inc   32511957  No Symbol
2   BB Inc   32511107        APC
3  CC Corp  166764100        CVX
4   CC Inc  166764900  No Symbol
5   CC Inc  166764950  No Symbol

By using ID as the key, I need to replace Symbol and Company in df1 , with those in df2 , so that the output looks like:通过使用ID作为键,我需要将df1中的SymbolCompany替换为df2中的那些,以便 output 看起来像:

         Date         ID Company     Symbol  Shares  Value
0  2014-09-30   32511907  AA Inc        AAA      10    101
1  2014-09-30   32511957  AA Inc  No Symbol      10    101
2  2014-12-31   32511907  AA Inc        AAA      18    152
3  2013-06-30   32511107  BB Inc        APC      43    373
4  2014-06-30  166764100 CC Corp        CVX      29    381
5  2014-09-30  166764900  CC Inc  No Symbol      13    155
6  2014-09-30  166764950  CC Inc  No Symbol      13    155
7  2015-09-30  475643276    Test        NaN      13    155

I have tried:我努力了:

df = df1.set_index('ID')
df.update(df2.set_index('ID'))
df = df1.loc[df1.Symbol == '', ['Company', 'Symbol']] = df.reset_index()

But when performed on the full dataset, I get ValueError: cannot reindex from a duplicate axis because line df.update(df2.set_index('ID')) makes ID the index, which results in a few duplicates.但是当在整个数据集上执行时,我得到ValueError: cannot reindex from a duplicate axis因为行df.update(df2.set_index('ID'))使ID成为索引,这导致了一些重复。

So are there any alternatives to achieving the desired output?那么有没有其他方法可以实现所需的 output?

A pd.merge can solve this.一个 pd.merge 可以解决这个问题。 Assume you only want IDs in df1假设您只需要 df1 中的 ID

result = pd.merge(left=df1[['Date', 'ID', 'Shares', 'Values']], 
                  right=df2, on=['ID'], how='left')

If you need to merge on different labels, you can use left_on and right_on.如果需要在不同的标签上进行合并,可以使用 left_on 和 right_on。

Change the how argument to specify what IDs to keep更改 how 参数以指定要保留的 ID

  • keep IDs only occurred in df1 ('left')保持 ID 仅出现在 df1 ('left')
  • keep IDs only occurred in df2 ('right')保持 ID 仅出现在 df2 中('right')
  • keep all IDs occurred in both df1 and df2 ('outer')保留 df1 和 df2 中出现的所有ID(“外部”)
  • keep only IDs occurred in both df1 and df1 ('inner')保留 df1 和 df1 ('inner') 中出现的 ID

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM