简体   繁体   English

有条件地用另一个字符串替换熊猫系列中的字符串

[英]conditionally Replace string in a Pandas series with another string

Take the below example. 请看下面的例子。 To replace one string in one particular column I have done this and it worked fine: 要在一个特定的列中替换一个字符串,我已经做到了,它可以正常工作:

df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'C'],
                   'data1': range(6),
                   'data2': ['A1', 'B1', 'C1', 'A1', 'B1', 'C1']},
                   columns = ['key', 'data1', 'data2'])

  key  data1 data2
0   A      0    A1
1   B      1    B1
2   C      2    C1
3   A      3    A1
4   B      4    B1
5   C      5    C1



df['data2']= df['data2'].str.strip().str.replace("A1","Bad")

  key  data1 data2
0   A      0    Bad
1   B      1    B1
2   C      2    C1
3   A      3    Bad
4   B      4    B1
5   C      5    C1

Q(1) How can we conditionally replace one string? Q(1)如何有条件地替换一个字符串? Meaning that, in column data2 , I would like to replace A1 but only if "key==A" and "data1">1 . 就是说,在data2列中,我只想替换A1data2 if "key==A" and "data1">1 How can I do that? 我怎样才能做到这一点?

Q(2) Can the conditional replacement be applied to multiple replacement (ie, replacing A1 and A2 at the same time with "Bad" but only under similar conditions? Q(2)是否可以将条件替换应用于多次替换(即,用“不良”同时替换A1 and A2 ,但只能在相似的条件下进行替换)?

You can use numpy and a regex -based replacement to cover A1, A2 and more. 您可以使用numpy和基于regex的替换来覆盖A1, A2等。 if we extend your data to include an example with A3 : 如果我们将您的数据扩展为包括A3的示例:

import pandas as pd
import numpy as np

df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'C', 'A'],
                   'data1': range(7),
                   'data2': ['A1', 'B1', 'C1', 'A1', 'B1', 'C1', 'A3']},
                   columns=['key', 'data1', 'data2'])

df['data2'] = np.where((df['key'] == 'A') & (df['data1'] > 1),
                       df['data2'].str.replace(r'A\d+','Bad'),
                       df['data2'])

This returns: 返回:

  key  data1 data2
0   A      0    A1
1   B      1    B1
2   C      2    C1
3   A      3   Bad
4   B      4    B1
5   C      5    C1
6   A      6   Bad

I think need filter column in both sides with replace only for filtered rows: 我认为两面都需要过滤器列,仅替换过滤的行:

mask = (df['key']=="A") &  (df['data1'] > 1)
df.loc[mask, 'data2']= df.loc[mask, 'data2'].str.strip().str.replace("A1","Bad")  

print (df)
  key  data1 data2
0   A      0    A1
1   B      1    B1
2   C      2    C1
3   A      3   Bad
4   B      4    B1
5   C      5    C1

If need multiple replace use replace with dict : 如果需要多次替换,请使用dict replace

df = pd.DataFrame({'key': ['A', 'A', 'C', 'A', 'B', 'C'],
                   'data1': range(6),
                   'data2': ['A1', 'A2', 'C1', 'A1', 'B1', 'C1']},
                   columns = ['key', 'data1', 'data2'])

mask = (df['key']=="A") &  (df['data1'] > 0)
df.loc[mask, 'data2']= df.loc[mask, 'data2'].str.strip().replace({"A1":"Bad", "A2":'Bad1'})  

Or use regex: 或使用正则表达式:

df.loc[mask, 'data2']= df.loc[mask, 'data2'].str.strip().str.replace(r'^A.*',"Bad")


print (df)
  key  data1 data2
0   A      0    A1
1   A      1  Bad1
2   C      2    C1
3   A      3   Bad
4   B      4    B1
5   C      5    C1

If we want to extend the example above in the following way: 如果我们要通过以下方式扩展上面的示例:

df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'C'],
                   'data1': range(6),
                   'data2': ['A1', 'B1', 'C1', 'A1', 'B1', 'C1']},
                   columns = ['key', 'data1', 'data2'])  

mask = (df['data1'] > 1)
df.loc[mask, 'data2']= df.loc[mask, 'data2'].str.strip().str.replace("A1",df['key']) 

  key  data1 data2
0   A      0    A1
1   B      1    B1
2   C      2   NaN
3   A      3   NaN
4   B      4   NaN
5   C      5   NaN

I am very surprised by the answer I thought the content of data2 would be replaced by content of column "key" (at the condition data1>1). 我以为data2的内容将被列“ key”的内容(在data1> 1的条件下)代替的答案让我感到非常惊讶。 any idea? 任何想法?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM