如何在python正则表达式中过滤出模式，直到输入单词

Question

In python, i want to extract a particular sub string till the input word provided. 在python中，我想提取特定的子字符串，直到提供输入的单词为止。

Consider the following string:- 考虑以下字符串：

"Name: abc and Age:24"

I want to extract the string "Name : abc and" änd "Age:24" seperately. 我想分别提取字符串"Name : abc and" änd "Age:24" 。 I am currently using the following pattern: 我目前正在使用以下模式：

re.search(r'%S+\s*:[\S\s]+',pattern).

but o/p is the whole string. 但是o / p是整个字符串。

Answer 1

You can use re.findall : 您可以使用re.findall ：

>>> import re
>>> s="Name: abc and Age:24"
>>> re.findall(r'[A-Za-z]+:[a-z\s]+|[A-Za-z]+:\d+',s)
['Name: abc and ', 'Age:24']

正则表达式可视化

Debuggex Demo Debuggex演示

In preceding pattern as in your string the keys( Age and Name ) starts with uppercase letters you ca use [A-Za-z]+ for match them.that will match any combinations of uppercase and lowercase letters with len 1 or more, but for the rest of string after : you can just use lower case letters, and also the same for second part.but for string after : in second part you just match a digit with length 1 or more! 在前面的字符串模式中，键（ Age和Name ）以大写字母开头，您可以使用[A-Za-z]+进行匹配。它将匹配len 1或更大的任何大小写字母组合，但是对于after之后的字符串:您可以只使用小写字母，第二部分也可以使用相同的字符。但是对于after :在第二部分中，您只需匹配长度为1或更大的数字！

If its possible that you had string in second part after : you can use \\w instead of \\d : 如果可能的话，在第二部分之后有字符串:您可以使用\\w代替\\d ：

>>> re.findall(r'[A-Za-z]+:[a-z\s]+|[A-Za-z]+:\w+',s)
['Name: abc def ghi ', 'Location:Earth']

Answer 2

You need to use re.findall . 您需要使用re.findall 。

>>> s = "Name: abc and Age:24"
>>> re.findall(r'\S+\s*:.*?(?=\s*\S+\s*:|$)', s)
['Name: abc and', 'Age:24']
>>> re.findall(r'[^\s:]+\s*:.*?(?=\s*[^\s:]+\s*:|$)', s)
['Name: abc and', 'Age:24']

[^\\s:]+ matches any character but not of : or space one or more times. [^\\s:]+匹配任何字符，但不匹配:或空格一次或多次。 So this matches the key part. 因此，这与关键部分匹配。
\\s*: matches zero or more spaces and the colon symbol. \\s*:匹配零个或多个空格和冒号。
.*? matches zero or more non-greedily until 非零地匹配零个或多个，直到
(?=\\s*[^\\s:]+\\s*:|$) the key part or end of the line. (?=\\s*[^\\s:]+\\s*:|$)的关键部分或结尾。 (?=...) called positive lookahead which asserts whether a match is possible or not. (?=...)称为正向超前，它断言是否可以进行匹配。 It won't match any single character. 它不会与任何单个字符匹配。

OR 要么

You could use re.split . 您可以使用re.split 。

>>> re.split(r'\s+(?=[^\s:]+\s*:)', s)
['Name: abc and', 'Age:24']

DEMO DEMO

Answer 3

You could use this regex: 您可以使用此正则表达式：

\w+[:]\w+|\w+[:](\s)\w+|\w+(\s)[:]\w+

Here's a breakdown: 这是一个细分：

\w+[:]\w+

\\w means get a word, [:] means get a colon character, the + symbol says get a word which is before the colon character. \\ w表示得到一个单词，[：]表示得到一个冒号，+符号表示得到一个在冒号之前的单词。 The rest of it works the other way around :) 其余的工作方式相反:)

The | | symbol is just an OR operator which I use to check if spaces follow or come before the colon. symbol只是一个OR运算符，我用它来检查空格是否在冒号之前或之后。

It will get the words that are before and after a colon. 它将得到冒号前后的单词。 It works when there is a space before or after the colon as well. 当在冒号之前或之后也有空格时，它会起作用。

如何在python正则表达式中过滤出模式，直到输入单词

问题描述

3 个解决方案

解决方案1
1 2015-03-31 10:29:39

解决方案2
0 2015-03-31 10:30:22

解决方案3
0 2015-03-31 10:41:23

如何在python正则表达式中过滤出模式，直到输入单词

问题描述

3 个解决方案

解决方案1 1 2015-03-31 10:29:39

解决方案2 0 2015-03-31 10:30:22

解决方案3 0 2015-03-31 10:41:23

解决方案1
1 2015-03-31 10:29:39

解决方案2
0 2015-03-31 10:30:22

解决方案3
0 2015-03-31 10:41:23