繁体   English   中英

正则表达式

[英]regular expression

这是部分数据

Broan Range Hood (BP130WWN) - White
Broan Range Hood (BP130BLN) - Black
Broan Range Hood (GP124WWN) - White
Broan Range Hood (GP130WWN) - White
Broan Range Hood (QS130WWN) - White
Broan Range Hood (QS130BLN) - Black
Broan Range Hood (QS130SSN) - Stainless
Broan Range Hood (QS230WWN) - White
Broan Range Hood (QS230BLN) - Black
Broan Range Hood (QS230SSN) - Stainless
Broan Range Hood (QS330WWN) - White
Broan Range Hood (QS330BLN) - Black
Broan Range Hood (QS330SSN) - Stainless
Broan Range Hood (E66130SSL) - Stainless
Broan Range Hood (RM503004) - Stainless
Broan Range Hood (273003) - Stainless

我想删除(RM503004) , (273003)这可能是包裹在() 3 到 11 个字母数字代码

使用 Python 我可以像下面这样使用它:

text = re.sub('[a-zA-Z0-9]{3,11}', ' ', dataset['Title'][i])

但它的输出并不如预期,这将是:

Broan Range Hood  - White
Broan Range Hood  - Black
Broan Range Hood  - White
Broan Range Hood  - White
Broan Range Hood  - White
Broan Range Hood  - Black
Broan Range Hood  - Stainless
Broan Range Hood  - White
Broan Range Hood  - Black
Broan Range Hood  - Stainless
Broan Range Hood  - White
Broan Range Hood  - Black
Broan Range Hood  - Stainless
Broan Range Hood  - Stainless
Broan Range Hood  - Stainless
Broan Range Hood  - Stainless

您还需要匹配文字括号,需要用反斜杠转义。

为了保持空白看起来不错,也匹配周围的空白,然后用空格替换:

text = re.sub('\s*\([a-zA-Z0-9]{3,11}\)\s*', ' ', dataset['Title'][i])

如此接近,只需添加转义括号:

\([a-zA-Z0-9]{3,11}\)\s*

测试

import re

string = '''
Broan Range Hood (BP130WWN) - White
Broan Range Hood (BP130BLN) - Black
Broan Range Hood (GP124WWN) - White
Broan Range Hood (GP130WWN) - White
Broan Range Hood (QS130WWN) - White
Broan Range Hood (QS130BLN) - Black
Broan Range Hood (QS130SSN) - Stainless
Broan Range Hood (QS230WWN) - White
Broan Range Hood (QS230BLN) - Black
Broan Range Hood (QS230SSN) - Stainless
Broan Range Hood (QS330WWN) - White
Broan Range Hood (QS330BLN) - Black
Broan Range Hood (QS330SSN) - Stainless
Broan Range Hood (E66130SSL) - Stainless
Broan Range Hood (RM503004) - Stainless
Broan Range Hood (273003) - Stainless

'''

expression = r'\([a-zA-Z0-9]{3,11}\)\s*'

print(re.sub(expression, '', string))

输出


Broan Range Hood - White
Broan Range Hood - Black
Broan Range Hood - White
Broan Range Hood - White
Broan Range Hood - White
Broan Range Hood - Black
Broan Range Hood - Stainless
Broan Range Hood - White
Broan Range Hood - Black
Broan Range Hood - Stainless
Broan Range Hood - White
Broan Range Hood - Black
Broan Range Hood - Stainless
Broan Range Hood - Stainless
Broan Range Hood - Stainless
Broan Range Hood - Stainless

如果你想简化/更新/探索表达式,它已在regex101.com 的右上角面板中进行了解释 如果您有兴趣,可以在此调试器链接中观看匹配步骤或修改它们。 调试器演示了 RegEx 引擎如何逐步使用一些示例输入字符串并执行匹配过程。


暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM