简体   繁体   中英

Python regex extract previous/next group from anchor point in string

Given a string containing four values:

1) Vehicle model        <- any number of alpha-numeric words
2) Engine description   <- one word before the next value:
3) Power output         <- \d+KW
4) Optional keywords    <- any number of alpha-numeric words

For example:
1-SERIE 118I 105KW EFF.DYN. BUSINESS LINE
MINI CLUBMAN 1.6T 128KW COOPER S
TWINGO 1.2 55KW

How to extract these into Python variables using re?

I think the simplest approach is to first find the power output (an anchor point), and then match the previous word to find the engine description , and then match everything before that to retrieve the model . Also match everything after the power output to find the optional keywords .

I feel I need to do something with (?<= ..) but I can't get it to work..

Slightly modified from Matt G. (added named groups and matches all optional keywords):

^(?P<model>([\S\s]+?))(?= \S+(?= \d+KW)) (?P<engine>(\S+))(?=(?= \d+KW)) (?P<kw>(\d+))KW(?P<keywords>(?<=KW)\s?(.*))

Try Regex: ^([\\S\\s]+?)(?= \\S+(?= \\d+KW)) (\\S+)(?=(?= \\d+KW)) (\\d+)KW(?: ([^\\s]+))*

Demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM