在python中使用正则表达式在冒号或括号之前提取字符串

Question

I'm trying to extract the string muscle pain from the following strings. 我正在尝试从以下琴弦中提取琴弦muscle pain 。 I need to use a regular expression that works for all three cases. 我需要使用适用于所有三种情况的正则表达式。

string1 = 'A1 muscle pain: immunotherapy'
string2 = 'A2B_45 muscle pain: topical medicine e.g. ....'
string3 = 'A2_45 muscle pain (pain): topical medicine e.g. ....'

The following code works for string1 and string2 . 以下代码适用于string1和string2 。 But it does not work for string3 . 但是它不适用于string3 。 What I get is always muscle pain (pain) . 我得到的总是muscle pain (pain) 。 Can anyone help me with that. 谁能帮助我。 I tried so many times with different expression but could not figure out how. 我用不同的表情尝试了很多次，但不知道怎么做。

re.match(r"^[A-Z]+\d*[A-Z]*_?\d*\s(.*)[:\(]", string3).group(1)

Answer 1

You can shorten the expression to: 您可以将表达式缩短为：

^A\S+\s([^:(]*)(?=:|\s\()

^A Assert position beginning of string. ^A字符串的起始位置。
\\S+ Any non whitespace characters. \\S+任何非空格字符。
\\s Whitespace character. \\s空格字符。
([^:(]*) Capture group. Match and capture anything other than a ( bracket or ] bracket. ([^:(]*)捕获组。匹配并捕获除(括号或]括号以外的任何内容。
(?=:|\\s\\() Positive lookahead for : or whitespace followed by ( . (?=:|\\s\\()正向搜索:或空格，后跟( 。

Try it live here . 在这里试一试。

Python snippet: Python片段：

import re
string1 = 'A1 muscle pain: immunotherapy'
string2 = 'A2B_45 muscle pain: topical medicine e.g. ....'
string3 = 'A2_45 muscle pain (pain): topical medicine e.g. ....'

print(re.match(r'^A\S+\s([^:(]*)(?=:|\s\()',string3).group(1))

Answer 2

Try this pattern: ^[\\dA-Z_]+ ([^\\(:]+) . 尝试以下模式： ^[\\dA-Z_]+ ([^\\(:]+) 。

It starts with [\\dA-Z_]+ at the beggining (note anchor ^ ), followed by space. 它在开始时以[\\dA-Z_]+开头（请注意锚点^ ），然后是空格。 Now, start capturing group until one of unwanted characters is met: [^\\(:] . You can add there more "unwanted" characters to alter regex to match differently. 现在，开始捕获组，直到遇到不需要的字符之一： [^\\(:] 。您可以在其中添加更多“不需要的”字符来更改正则表达式以匹配不同的内容。

First capturing group is what you want. 第一个捕获组是您想要的。

Demo 演示版

You could try this pattern to remove space after third match: ^[\\dA-Z_]+ ([\\w ]+)(?=(:| \\()) . See demo. 您可以尝试在第三次匹配后使用此模式删除空间： ^[\\dA-Z_]+ ([\\w ]+)(?=(:| \\()) 。

在python中使用正则表达式在冒号或括号之前提取字符串

问题描述

2 个解决方案

解决方案1
3 2018-08-06 18:07:02

解决方案2
1 已采纳 2018-08-06 19:19:18

在python中使用正则表达式在冒号或括号之前提取字符串

问题描述

2 个解决方案

解决方案1 3 2018-08-06 18:07:02

解决方案2 1 已采纳 2018-08-06 19:19:18

解决方案1
3 2018-08-06 18:07:02

解决方案2
1 已采纳 2018-08-06 19:19:18