简体   繁体   中英

Splitting string into groups with regex?

I have strings that can have a various amount of "groups". I need to split them, but I am having trouble doing so. The groups will always start with [AZ]{2-5} followed by a : and a string or varying length and spaces. It will always have a space in front of the group.

Example strings:

"YellowSky AA:Hello AB:1234 AC:1F 322 AD:hj21jkhjk23"
"Billy Bob Thorton AA:213231 AB:aaaa AC:ddddd 322 AD:hj2ffs   dsfdsfd1jkhjk23"

My code thus far:

import re
D = "Test1 AA:Hello AB:1234 AC:1F 322 AD:hj21jkhjk23"
    
g = re.compile("(?<!^)\s+(?=[A-Z])(?!.\s)").split(D)

As you can see... this works for one word starting string, but not multiple words.

作品

But this fails /w spaces: 不工作

You can use

re.split(r'(?!^)\s+(?=[A-Z]+:)', text)

See this regex demo .

Details :

  • (?!^) - a negative lookahead that matches a location not at the start of string (equal to (?<!^) but one char shorter)
  • \s+ - one or more whitespaces
  • (?=[AZ]+:) - a positive lookahead that requires one or more uppercase ASCII letters followed with a : char immediately to the right of the current location.
([A-Z]{2,5}:\w+(?: +\w+)*)(?=(?: +[A-Z]+:|$))

You can also use re.findall directly.

See demo.

https://regex101.com/r/6jf8EM/1

This way you dont need to filter unwanted groups later.You get what you need.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM