简体   繁体   中英

how to filter out a pattern in python regular expressions, till the input word

In python, i want to extract a particular sub string till the input word provided.

Consider the following string:-

"Name: abc and Age:24"

I want to extract the string "Name : abc and" änd "Age:24" seperately. I am currently using the following pattern:

re.search(r'%S+\s*:[\S\s]+',pattern).

but o/p is the whole string.

You can use re.findall :

>>> import re
>>> s="Name: abc and Age:24"
>>> re.findall(r'[A-Za-z]+:[a-z\s]+|[A-Za-z]+:\d+',s)
['Name: abc and ', 'Age:24']

正则表达式可视化

Debuggex Demo

In preceding pattern as in your string the keys( Age and Name ) starts with uppercase letters you ca use [A-Za-z]+ for match them.that will match any combinations of uppercase and lowercase letters with len 1 or more, but for the rest of string after : you can just use lower case letters, and also the same for second part.but for string after : in second part you just match a digit with length 1 or more!

If its possible that you had string in second part after : you can use \\w instead of \\d :

>>> re.findall(r'[A-Za-z]+:[a-z\s]+|[A-Za-z]+:\w+',s)
['Name: abc def ghi ', 'Location:Earth']

You need to use re.findall .

>>> s = "Name: abc and Age:24"
>>> re.findall(r'\S+\s*:.*?(?=\s*\S+\s*:|$)', s)
['Name: abc and', 'Age:24']
>>> re.findall(r'[^\s:]+\s*:.*?(?=\s*[^\s:]+\s*:|$)', s)
['Name: abc and', 'Age:24']
  • [^\\s:]+ matches any character but not of : or space one or more times. So this matches the key part.
  • \\s*: matches zero or more spaces and the colon symbol.
  • .*? matches zero or more non-greedily until
  • (?=\\s*[^\\s:]+\\s*:|$) the key part or end of the line. (?=...) called positive lookahead which asserts whether a match is possible or not. It won't match any single character.

OR

You could use re.split .

>>> re.split(r'\s+(?=[^\s:]+\s*:)', s)
['Name: abc and', 'Age:24']

DEMO

You could use this regex:

\w+[:]\w+|\w+[:](\s)\w+|\w+(\s)[:]\w+

Here's a breakdown:

\w+[:]\w+

\\w means get a word, [:] means get a colon character, the + symbol says get a word which is before the colon character. The rest of it works the other way around :)

The | symbol is just an OR operator which I use to check if spaces follow or come before the colon.

It will get the words that are before and after a colon. It works when there is a space before or after the colon as well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM