[英]Replace strings in pandas cell using regex, between two specific strings
there were some tipics connected with strings 'between' replacement, but I think I have something wrong in my regex, or maybe I should use different approach.有一些与字符串“之间”替换相关的技巧,但我认为我的正则表达式有问题,或者我应该使用不同的方法。
I need to replace in my Name
column word (in this case is
, but it will be not always is
, sometime different word) with is not
.我需要用
is not
替换我的Name
列中的单词(在这种情况下is
,但并不总是is
,有时是不同的单词)。 This specific world is between numbers ending with 'h'directly.这个特定的世界位于直接以“h”结尾的数字之间。
my df:我的df:
df=pd.DataFrame({'Name':['Adam is 23.2h is 223h mike is 223h',
'Katie is 13.2h is 22h mike is 223h','Ilam is 2h is 223h mike is 223h',
'Katie','Brody','Brody like mike'],
'B':[20,20,21,21,22,21]})
B Name
0 20 Adam is 23.2h is 223h mike is 223h
1 20 Katie is 13.2h is 22h mike is 223h
2 21 Ilam is 2h is 223h mike is 223h
3 21 Katie
4 22 Brody
5 21 Brody like mike
expected output:预期 output:
B Name
0 20 Adam is 23.2h is not 223h mike is 223h
1 20 Katie is 13.2h is not 22h mike is 223h
2 21 Ilam is 2h is not 223h mike is 223h
3 21 Katie
4 22 Brody
5 21 Brody like mike
code:代码:
df.Name=df.Name.replace({'([0-9]{1,8}.[0-9]{1,4}h|[0-9]{1,8}h)(.*?)([0-9]{1,8}.[0-9]{1,4}h|[0-9]{1,8}h)':'is not'},regex=True)
To use matching group write it :r'\1 is not \3'
.要使用匹配组写它
:r'\1 is not \3'
。 And, seems, you can use a little easier regex而且,您似乎可以使用更简单的正则表达式
df.Name.replace({'([0-9]{1,8}(?:.[0-9]{1,4})?h)(.*?)([0-9]{1,8}(.[0-9]{1,4})?h)':r'\1 is not \3'}, regex=True)
0 Adam is 23.2h is not 223h mike is 223h
1 Katie is 13.2h is not 22h mike is 223h
2 Ilam is 2h is not 223h mike is 223h
3 Katie
4 Brody
5 Brody like mike
Name: Name, dtype: object
You can try using apply
with re.sub(r'(?<=\dh )is', 'is not', text)
.您可以尝试将
apply
与re.sub(r'(?<=\dh )is', 'is not', text)
一起使用。
Code代码
import pandas as pd
import re
df=pd.DataFrame({'Name':['Adam is 23.2h is 223h mike is 223h',
'Katie is 13.2h is 22h mike is 223h','Ilam is 2h is 223h mike is 223h',
'Katie','Brody','Brody like mike'],
'B':[20,20,21,21,22,21]})
df['Name'] = df['Name'].apply(lambda t: re.sub(r'(?<=\dh )is', 'is not', t))
Output Output
print(df)
# Name B
# 0 Adam is 23.2h is not 223h mike is 223h 20
# 1 Katie is 13.2h is not 22h mike is 223h 20
# 2 Ilam is 2h is not 223h mike is 223h 21
# 3 Katie 21
# 4 Brody 22
# 5 Brody like mike 21
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.