简体   繁体   English

Python正则表达式替换字符串

[英]Python regex replace string

I'm trying to achieve the following: 我正在努力实现以下目标:

string = 'C:/some path to mp3/song (7) title and so on (1).mp3'

should become: 应该变成:

C:/some path to mp3/song (7) title and so on.mp3

To match it i'm using the following regex: 为了匹配它,我使用以下正则表达式:

pattern = '.*(\s\([0-9]+\))\.mp3'

And the match group contains: (u' (1)',) 匹配组包含: (u' (1)',)
however, when i'm trying to substitute the match like so: 但是,当我试图以这种方式替换比赛时:

processed = re.sub(pattern, '', string)

processed contains an empty string. 已处理包含一个空字符串。 How can i get re.sub() to only replace the match found above? 我如何获得re.sub()只替换上面找到的匹配项?

You were matching the entire string and replacing it, use a lookahead and only match the whitespace and (1) before the final extension. 您要匹配整个字符串并替换它,使用前瞻性,仅匹配空格和(1) ,最后扩展名之前。

Expanded RegEx: 扩展的正则表达式:

\s*     (?# 0+ characters of leading whitespace)
\(      (?# match ( literally)
[0-9]+  (?# match 1+ digits)
\)      (?# match ) literally)
(?=     (?# start lookahead)
  \.    (?# match . literally)
  mp3   (?# match the mp3 extension)
  $     (?# match the end of the string)
)       (?# end lookeahd)

Demo: Regex101 演示: Regex101

Implementation: 执行:

pattern = '\s*\([0-9]+\)(?=\.mp3$)'
processed = re.sub(pattern, '', string)

Notes: 笔记:

  • mp3 can be replaced by [^.]+ to match any extension or (mp3|mp4) to match multiple extensions. mp3可以替换为[^.]+以匹配任何扩展名,也可以替换为(mp3|mp4)匹配多个扩展名。
  • use \\s+ instead of \\s* to require at least some whitespace before (1) , thanks @SethMMorton . 使用\\s+而不是\\s*(1)之前至少需要一些空格, 谢谢@SethMMorton

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM