简体   繁体   中英

Extract string before and string after based on a regex match in python

I want to extract strings before and string after a relational operator(>,<,>=,<=,,=,=) in regex using python

input:

Find me products where sales >= 200000 and profit > 20% by country

output

[[sales,>=,200000],[profit,<,20%]]

I am able to get the string before the operator and the operator using

\w+(?=\s+([<>]=?|[!=]=))

How do i get the string after as well in the same list? Any help is much appreciated

While pyOliv's answer already gives the wanted output, your use of the positive lookahead made me wonder whether the positive lookbehind might also be worthwhile to look into. That might make identifying the pattern after the relational operator more flexible, eg if you do not know how many occurrence of relational operators you have to expect. The matching pattern would be:

(?<=\s[<>!]=\s)[0-9,%]+|(?<=\s[<>=]\s)[0-9,%]+

The lookbehind has the disadvantage that it needs to know the length of the pattern it matches beforehand, so using "+", "*" or "|" within it will not work. This leads to the slightly more cumbersome version, where one lookbehind is used to match the length = 2 operators, and one is used to match the length = 1 operators.

you need to give more details about the strings your are looking through. Base on your example:

import re
txt = 'sales >= 200,000 and profit > 20%'
match = re.match(r"(.*) ([<>=!]{1,2}) (.*) .* (.*) ([<>=!]{1,2}) (.*)", txt)
for i in range(1,6):
    print(match.group(i))

output:

sales
>=
200,000
profit
>

EDIT: Considering a more general case, you have this function, that give the exact output you need:

import re

def split_txt(txt):
    lst = re.findall(r"\w+ [<>=!]{1,2} \w+", txt)
    out = []
    for sub_list in lst:
        match = re.match(r"(\w+) ([<>=!]{1,2}) (\w+)", sub_list)
        out.append([match.group(1), match.group(2), match.group(3)])
    return out


txt = 'bbl sales >= 200,000 and profit > 20% another text id != 25'
a = split_txt(txt)
print(a)

out: [['sales', '>=', '200'], ['profit', '>', '20'], ['id', ',=', '25']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM