[英]Pandas - Replace value from another column in certain conditions
I have two columns in my DataFrame
.我的DataFrame
中有两列。 I would like to replace the value of the first column with the second column if the text in the first column is a substring in the second column.如果第一列中的文本是第二列中的 substring,我想用第二列替换第一列的值。
Example:例子:
Input:
col1 col2
-----------------
text1 text1 and text2
some text some other text
text 3
text 4 this is text 4
Output:
col1 col2
------------------------------
text1 and text2 text1 and text2
some text some other text
text 3
this is text 4 this is text 4
As you see I have replaces row 1 and row 4 as the text in row 1 column 1 is a substring of column 2.如您所见,我已经替换了第 1 行和第 4 行,因为第 1 行第 1 列中的文本是第 2 列的 substring。
How can I perform this operation in pandas?如何在 pandas 中执行此操作?
Try df.apply
with axis=1
.尝试df.apply
与axis=1
。
So this would iterate through each row and check whether col1 is substring of col2.所以这将遍历每一行并检查 col1 是否是 col2 的 substring。
If yes then return col2 else return col1如果是则返回 col2 否则返回 col1
df['col1'] = df.apply(lambda row: row['col2'] if row['col1'] in row['col2'] else row['col1'], axis=1)
Full Code:完整代码:
df = pd.DataFrame({'col1': ['text1', 'some text', 'text 3', 'text 4'], 'col2': ['text1 and text2', 'some other text', '', 'this is text 4']})
df['new_col1'] = df.apply(lambda row: row['col2'] if row['col1'] in row['col2'] else row['col1'], axis=1)
df
col1 col2 new_col1
0 text1 text1 and text2 text1 and text2
1 some text some other text some text
2 text 3 text 3
3 text 4 this is text 4 this is text 4
A NaN safe python option via zip
:通过zip
的 NaN 安全 python 选项:
import numpy as np
import pandas as pd
df = pd.DataFrame({
'col1': {0: 'text1', 1: 'some text', 2: 'text 3 ', 3: 'text 4'},
'col2': {0: 'text1 and text2', 1: 'some other text', 2: np.nan,
3: 'this is text 4'}
})
df['col1'] = [b if isinstance(b, str) and a in b else a
for a, b in zip(df['col1'], df['col2'])]
A NaN safe pandas option via fillna
+ apply
:通过fillna
+ apply
的 NaN 安全 pandas 选项:
import numpy as np
import pandas as pd
df = pd.DataFrame({
'col1': {0: 'text1', 1: 'some text', 2: 'text 3 ', 3: 'text 4'},
'col2': {0: 'text1 and text2', 1: 'some other text', 2: np.nan,
3: 'this is text 4'}
})
df['col1'] = df.fillna('').apply(
lambda x: x['col2'] if x['col1'] in x['col2'] else x['col1'],
axis=1
)
Another option via boolean index isna
+ loc
:通过 boolean 索引isna
+ loc
的另一个选项:
m = ~df['col2'].isna()
df.loc[m, 'col1'] = df[m].apply(
lambda x: x['col2'] if x['col1'] in x['col2'] else x['col1'],
axis=1
)
df
: df
:
col1 col2
0 text1 and text2 text1 and text2
1 some text some other text
2 text 3 NaN
3 this is text 4 this is text 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.