I have strings that can have a various amount of "groups". I need to split them, but I am having trouble doing so. The groups will always start with [AZ]{2-5}
followed by a :
and a string or varying length and spaces. It will always have a space in front of the group.
Example strings:
"YellowSky AA:Hello AB:1234 AC:1F 322 AD:hj21jkhjk23"
"Billy Bob Thorton AA:213231 AB:aaaa AC:ddddd 322 AD:hj2ffs dsfdsfd1jkhjk23"
My code thus far:
import re
D = "Test1 AA:Hello AB:1234 AC:1F 322 AD:hj21jkhjk23"
g = re.compile("(?<!^)\s+(?=[A-Z])(?!.\s)").split(D)
As you can see... this works for one word starting string, but not multiple words.
You can use
re.split(r'(?!^)\s+(?=[A-Z]+:)', text)
See this regex demo .
Details :
(?!^)
- a negative lookahead that matches a location not at the start of string (equal to (?<!^)
but one char shorter) \s+
- one or more whitespaces (?=[AZ]+:)
- a positive lookahead that requires one or more uppercase ASCII letters followed with a :
char immediately to the right of the current location. ([A-Z]{2,5}:\w+(?: +\w+)*)(?=(?: +[A-Z]+:|$))
You can also use re.findall
directly.
See demo.
https://regex101.com/r/6jf8EM/1
This way you dont need to filter unwanted groups later.You get what you need.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.