使用正则表达式替换 pandas 单元格中的字符串，在两个特定字符串之间

Question

there were some tipics connected with strings 'between' replacement, but I think I have something wrong in my regex, or maybe I should use different approach.有一些与字符串“之间”替换相关的技巧，但我认为我的正则表达式有问题，或者我应该使用不同的方法。

I need to replace in my Name column word (in this case is , but it will be not always is , sometime different word) with is not .我需要用is not替换我的Name列中的单词（在这种情况下is ，但并不总是is ，有时是不同的单词）。 This specific world is between numbers ending with 'h'directly.这个特定的世界位于直接以“h”结尾的数字之间。

my df:我的df：

df=pd.DataFrame({'Name':['Adam is 23.2h is 223h mike is 223h',
'Katie is 13.2h is 22h mike is 223h','Ilam is 2h is 223h mike is 223h',
'Katie','Brody','Brody like mike'],
'B':[20,20,21,21,22,21]})

    B                                Name
0  20  Adam is 23.2h is 223h mike is 223h
1  20  Katie is 13.2h is 22h mike is 223h
2  21     Ilam is 2h is 223h mike is 223h
3  21                               Katie
4  22                               Brody
5  21                     Brody like mike

expected output:预期 output：

    B                                     Name
0  20   Adam is 23.2h is not 223h mike is 223h
1  20   Katie is 13.2h is not 22h mike is 223h
2  21      Ilam is 2h is not 223h mike is 223h
3  21                                    Katie
4  22                                    Brody
5  21                          Brody like mike

code:代码：

df.Name=df.Name.replace({'([0-9]{1,8}.[0-9]{1,4}h|[0-9]{1,8}h)(.*?)([0-9]{1,8}.[0-9]{1,4}h|[0-9]{1,8}h)':'is not'},regex=True)

Answer 1

To use matching group write it :r'\1 is not \3' .要使用匹配组写它:r'\1 is not \3' 。 And, seems, you can use a little easier regex而且，您似乎可以使用更简单的正则表达式

   df.Name.replace({'([0-9]{1,8}(?:.[0-9]{1,4})?h)(.*?)([0-9]{1,8}(.[0-9]{1,4})?h)':r'\1 is not \3'}, regex=True)

0    Adam is 23.2h is not 223h mike is 223h
1    Katie is 13.2h is not 22h mike is 223h
2       Ilam is 2h is not 223h mike is 223h
3                                     Katie
4                                     Brody
5                           Brody like mike
Name: Name, dtype: object

Answer 2

You can try using apply with re.sub(r'(?<=\dh )is', 'is not', text) .您可以尝试将apply与re.sub(r'(?<=\dh )is', 'is not', text)一起使用。

Code代码

import pandas as pd
import re

df=pd.DataFrame({'Name':['Adam is 23.2h is 223h mike is 223h',
'Katie is 13.2h is 22h mike is 223h','Ilam is 2h is 223h mike is 223h',
'Katie','Brody','Brody like mike'],
'B':[20,20,21,21,22,21]})

df['Name'] = df['Name'].apply(lambda t: re.sub(r'(?<=\dh )is', 'is not', t))

Output Output

print(df)
#                                      Name   B
# 0  Adam is 23.2h is not 223h mike is 223h  20
# 1  Katie is 13.2h is not 22h mike is 223h  20
# 2     Ilam is 2h is not 223h mike is 223h  21
# 3                                   Katie  21
# 4                                   Brody  22
# 5                         Brody like mike  21

使用正则表达式替换 pandas 单元格中的字符串，在两个特定字符串之间

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-11-16 09:59:40

解决方案2
0 2019-11-16 09:47:12

使用正则表达式替换 pandas 单元格中的字符串，在两个特定字符串之间

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-11-16 09:59:40

解决方案2 0 2019-11-16 09:47:12

解决方案1
1 已采纳 2019-11-16 09:59:40

解决方案2
0 2019-11-16 09:47:12