简体   繁体   English

Python pandas根据列值条件替换字符串

[英]Python pandas conditional replace string based on column values

Given these data frames...:鉴于这些数据框...:

DF = pd.DataFrame({'COL1': ['A', 'B', 'C', 'D','D','D'], 
                   'COL2': [11032, 1960, 11400, 11355, 8, 7], 
                   'year': ['2016', '2017', '2018', '2019', '2020', '2021']})
DF

   COL1 COL2    year
0   A   11032   2016
1   B   1960    2017
2   C   11400   2018
3   D   11355   2019
4   D   8       2020
5   D   7       2021

DF2 = pd.DataFrame({'ColX': ['D'], 'ColY':['2021'], 'ColZ':[100]
DF2
        ColX   ColY    ColZ
   0     D      2021   100

If the following conditions are met:如果满足以下条件:

COL1 = ColX from DF2 COL1 = 来自 DF2 的 ColX

year = ColY from DF2年 = 来自 DF2 的 ColY

Then change the value in COL2 to ColZ from DF2.然后将 COL2 中的值从 DF2 更改为 ColZ。

This looks like you want to update DF with data from DF2 .看起来您想用来自DF2数据update DF

Assuming that all values in DF2 are unique for a given pair of values in ColX and ColY :假设DF2中的所有值对于ColXColY的给定值对都是唯一的:

DF = DF.merge(DF2.set_index(['ColX', 'ColY'])[['ColZ']], 
              how='left', 
              left_on=['COL1', 'year'], 
              right_index=True)
DF.COL2.update(DF.ColZ)
del DF['ColZ']

>>> DF
  COL1   COL2  year
0    A  11032  2016
1    B   1960  2017
2    C  11400  2018
3    D  11355  2019
4    D      8  2020
5    D    100  2021

I merge a temporary dataframe ( DF2.set_index(['ColX', 'ColY'])[['ColZ']] ) into DF, which adds all the values from ColZ where its index ( ColX and ColY ) match the values from COL1 and year in DF .我将一个临时数据帧( DF2.set_index(['ColX', 'ColY'])[['ColZ']] )合并到 DF 中,它添加了来自 ColZ 的所有值,其中它的索引( ColXColY )与来自的值匹配DF COL1year All non-matching values are filled with NA .所有不匹配的值都用NA填充。

I then use update to overwrite the values in DF.COL2 from the non-null values in DF.ColZ.然后我使用updateDF.COL2中的非空值覆盖DF.COL2中的值。

I then delete DF['ColZ'] to clean-up.然后我删除 DF['ColZ'] 进行清理。

If ColZ matches an existing column name in DF , then you would need to make some adjustments.如果ColZDF中的现有列名称匹配,则您需要进行一些调整。

An alternative solution is as follows:另一种解决方案如下:

DF = DF.set_index(['COL1', 'year']).update(DF2.set_index(['ColX', 'ColY']))
DF.reset_index(inplace=True)

The output is identical to that above.输出与上面的相同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM