[英]regular expression
这是部分数据
Broan Range Hood (BP130WWN) - White Broan Range Hood (BP130BLN) - Black Broan Range Hood (GP124WWN) - White Broan Range Hood (GP130WWN) - White Broan Range Hood (QS130WWN) - White Broan Range Hood (QS130BLN) - Black Broan Range Hood (QS130SSN) - Stainless Broan Range Hood (QS230WWN) - White Broan Range Hood (QS230BLN) - Black Broan Range Hood (QS230SSN) - Stainless Broan Range Hood (QS330WWN) - White Broan Range Hood (QS330BLN) - Black Broan Range Hood (QS330SSN) - Stainless Broan Range Hood (E66130SSL) - Stainless Broan Range Hood (RM503004) - Stainless Broan Range Hood (273003) - Stainless
我想删除(RM503004)
, (273003)
这可能是包裹在()
3 到 11 个字母数字代码
使用 Python 我可以像下面这样使用它:
text = re.sub('[a-zA-Z0-9]{3,11}', ' ', dataset['Title'][i])
但它的输出并不如预期,这将是:
Broan Range Hood - White Broan Range Hood - Black Broan Range Hood - White Broan Range Hood - White Broan Range Hood - White Broan Range Hood - Black Broan Range Hood - Stainless Broan Range Hood - White Broan Range Hood - Black Broan Range Hood - Stainless Broan Range Hood - White Broan Range Hood - Black Broan Range Hood - Stainless Broan Range Hood - Stainless Broan Range Hood - Stainless Broan Range Hood - Stainless
您还需要匹配文字括号,需要用反斜杠转义。
为了保持空白看起来不错,也匹配周围的空白,然后用空格替换:
text = re.sub('\s*\([a-zA-Z0-9]{3,11}\)\s*', ' ', dataset['Title'][i])
如此接近,只需添加转义括号:
\([a-zA-Z0-9]{3,11}\)\s*
import re
string = '''
Broan Range Hood (BP130WWN) - White
Broan Range Hood (BP130BLN) - Black
Broan Range Hood (GP124WWN) - White
Broan Range Hood (GP130WWN) - White
Broan Range Hood (QS130WWN) - White
Broan Range Hood (QS130BLN) - Black
Broan Range Hood (QS130SSN) - Stainless
Broan Range Hood (QS230WWN) - White
Broan Range Hood (QS230BLN) - Black
Broan Range Hood (QS230SSN) - Stainless
Broan Range Hood (QS330WWN) - White
Broan Range Hood (QS330BLN) - Black
Broan Range Hood (QS330SSN) - Stainless
Broan Range Hood (E66130SSL) - Stainless
Broan Range Hood (RM503004) - Stainless
Broan Range Hood (273003) - Stainless
'''
expression = r'\([a-zA-Z0-9]{3,11}\)\s*'
print(re.sub(expression, '', string))
Broan Range Hood - White
Broan Range Hood - Black
Broan Range Hood - White
Broan Range Hood - White
Broan Range Hood - White
Broan Range Hood - Black
Broan Range Hood - Stainless
Broan Range Hood - White
Broan Range Hood - Black
Broan Range Hood - Stainless
Broan Range Hood - White
Broan Range Hood - Black
Broan Range Hood - Stainless
Broan Range Hood - Stainless
Broan Range Hood - Stainless
Broan Range Hood - Stainless
如果你想简化/更新/探索表达式,它已在regex101.com 的右上角面板中进行了解释。 如果您有兴趣,可以在此调试器链接中观看匹配步骤或修改它们。 调试器演示了 RegEx 引擎如何逐步使用一些示例输入字符串并执行匹配过程。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.