简体   繁体   English

如何使用正则表达式删除带有数字的特定单词模式?

[英]How to use regex to remove a particular pattern of words with numbers?

I have a string of words which generate different patterns of similar words through different audio file, I want to use regex pattern to get that pattern of words and remove it for the actual text.我有一串单词,它们通过不同的音频文件生成不同模式的相似单词,我想使用正则表达式模式来获取该单词模式并将其删除为实际文本。 For example I have the text below:例如,我有以下文字:

text = "Yeah Cool\nSpeaker 100:00:03Uh, you know, when you score three goals, you expect to win a game, you know, but, uh,"

All I want to do is a regex pattern that can detect Speaker 100:00:03 and other similar pattern, depending on the audio file, at times i might have Speaker 100:00:01 which looks different from the first one but they are similar我想做的只是一个正则表达式模式,它可以检测扬声器 100:00:03和其他类似模式,具体取决于音频文件,有时我可能有扬声器 100:00:01 ,它看起来与第一个不同,但它们是相似的

Is there a better way to do this?有一个更好的方法吗?

I was using string replace which is not a universal solution which is this:我使用的是字符串replace ,这不是一个通用的解决方案,它是这样的:

new_text  = text.replace('Speaker 000:00:00', '')

This is the expected result after applying regex which is what I'm expecting.这是应用正则表达式后的预期结果,这是我所期待的。

text = "Yeah Cool Uh, you know, when you score three goals, you expect to win a game, you know, but, uh,"

Depending on the exact format of the timestamp, re.sub with the following pattern should work根据时间戳的确切格式,具有以下模式的re.sub应该可以工作

>>> re.sub('\nSpeaker \d{1,3}:\d{2}:\d{2}', ' ', text)
'Yeah Cool Uh, you know, when you score three goals, you expect to win a game, you know, but, uh,'

Very simple regular expression:非常简单的正则表达式:

import re
text = "Yeah Cool\nSpeaker 100:00:03Uh, you know, when you score three goals, you expect to win a game, you know, but, uh,"
re.sub(r'\nSpeaker \d\d\d:\d\d:\d\d', ' ', text)                                                
# 'Yeah Cool Uh, you know, when you score three goals, you expect to win a game, you know, but, uh,'
“\nSpeaker \d{3}:\d{2}:\d{2}”

\d detects a digit and {3} means three times... so \d{3} means three digits. \d检测到一个数字, {3}表示三次......所以\d{3}表示三个数字。

Try regex101.com it's a great site to experiment with reflex.试试regex101.com这是一个试验反射的好地方。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM