Pandas - 在某些情况下替换另一列的值

Question

I have two columns in my DataFrame .我的DataFrame中有两列。 I would like to replace the value of the first column with the second column if the text in the first column is a substring in the second column.如果第一列中的文本是第二列中的 substring，我想用第二列替换第一列的值。

Example:例子：

Input: 

col1       col2
-----------------
text1      text1 and text2
some text  some other text
text 3     
text 4     this is text 4

Output:

col1                 col2
------------------------------
text1 and text2      text1 and text2
some text            some other text
text 3     
this is text 4       this is text 4

As you see I have replaces row 1 and row 4 as the text in row 1 column 1 is a substring of column 2.如您所见，我已经替换了第 1 行和第 4 行，因为第 1 行第 1 列中的文本是第 2 列的 substring。

How can I perform this operation in pandas?如何在 pandas 中执行此操作？

Answer 1

Try df.apply with axis=1 .尝试df.apply与axis=1 。

So this would iterate through each row and check whether col1 is substring of col2.所以这将遍历每一行并检查 col1 是否是 col2 的 substring。
If yes then return col2 else return col1如果是则返回 col2 否则返回 col1

df['col1'] = df.apply(lambda row: row['col2'] if row['col1'] in row['col2'] else row['col1'], axis=1)

Full Code:完整代码：

df = pd.DataFrame({'col1': ['text1', 'some text', 'text 3', 'text 4'], 'col2': ['text1 and text2', 'some other text', '', 'this is text 4']})

df['new_col1'] = df.apply(lambda row: row['col2'] if row['col1'] in row['col2'] else row['col1'], axis=1)

df

        col1    col2             new_col1
0   text1       text1 and text2  text1 and text2
1   some text   some other text  some text
2   text 3                       text 3
3   text 4      this is text 4   this is text 4

Answer 2

A NaN safe python option via zip :通过zip的 NaN 安全 python 选项：

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'col1': {0: 'text1', 1: 'some text', 2: 'text 3 ', 3: 'text 4'},
    'col2': {0: 'text1 and text2', 1: 'some other text', 2: np.nan,
             3: 'this is text 4'}
})

df['col1'] = [b if isinstance(b, str) and a in b else a
              for a, b in zip(df['col1'], df['col2'])]

A NaN safe pandas option via fillna + apply :通过fillna + apply的 NaN 安全 pandas 选项：

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'col1': {0: 'text1', 1: 'some text', 2: 'text 3 ', 3: 'text 4'},
    'col2': {0: 'text1 and text2', 1: 'some other text', 2: np.nan,
             3: 'this is text 4'}
})

df['col1'] = df.fillna('').apply(
    lambda x: x['col2'] if x['col1'] in x['col2'] else x['col1'],
    axis=1
)

Another option via boolean index isna + loc :通过 boolean 索引isna + loc的另一个选项：

m = ~df['col2'].isna()
df.loc[m, 'col1'] = df[m].apply(
    lambda x: x['col2'] if x['col1'] in x['col2'] else x['col1'],
    axis=1
)

df : df ：

              col1             col2
0  text1 and text2  text1 and text2
1        some text  some other text
2          text 3               NaN
3   this is text 4   this is text 4

Pandas - 在某些情况下替换另一列的值

问题描述

2 个解决方案

解决方案1
2 2021-05-23 15:41:40

解决方案2
1 已采纳 2021-05-23 18:00:11

Pandas - 在某些情况下替换另一列的值

问题描述

2 个解决方案

解决方案1 2 2021-05-23 15:41:40

解决方案2 1 已采纳 2021-05-23 18:00:11

解决方案1
2 2021-05-23 15:41:40

解决方案2
1 已采纳 2021-05-23 18:00:11