[英]How to use regex to remove a particular pattern of words with numbers?
我有一串单词,它们通过不同的音频文件生成不同模式的相似单词,我想使用正则表达式模式来获取该单词模式并将其删除为实际文本。 例如,我有以下文字:
text = "Yeah Cool\nSpeaker 100:00:03Uh, you know, when you score three goals, you expect to win a game, you know, but, uh,"
我想做的只是一个正则表达式模式,它可以检测扬声器 100:00:03和其他类似模式,具体取决于音频文件,有时我可能有扬声器 100:00:01 ,它看起来与第一个不同,但它们是相似的
有一个更好的方法吗?
我使用的是字符串replace
,这不是一个通用的解决方案,它是这样的:
new_text = text.replace('Speaker 000:00:00', '')
这是应用正则表达式后的预期结果,这是我所期待的。
text = "Yeah Cool Uh, you know, when you score three goals, you expect to win a game, you know, but, uh,"
根据时间戳的确切格式,具有以下模式的re.sub
应该可以工作
>>> re.sub('\nSpeaker \d{1,3}:\d{2}:\d{2}', ' ', text)
'Yeah Cool Uh, you know, when you score three goals, you expect to win a game, you know, but, uh,'
非常简单的正则表达式:
import re
text = "Yeah Cool\nSpeaker 100:00:03Uh, you know, when you score three goals, you expect to win a game, you know, but, uh,"
re.sub(r'\nSpeaker \d\d\d:\d\d:\d\d', ' ', text)
# 'Yeah Cool Uh, you know, when you score three goals, you expect to win a game, you know, but, uh,'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.