简体   繁体   中英

How to select text between two given words?

Langaue(Python3.X, Re library)

I have a string as follows

import re
query_string = r'SELECT "a"."name", "a"."create_date", "a"."state", SUM("b"."cost") AS "amount", SUM("b"."cost") FILTER (WHERE "a"."state" = 'UNPAID') AS "paid", SUM("b"."cost") FILTER (WHERE "a"."state" = 'PAID') AS "unpaid" FROM "maintenance"'

I want to select "column names" ie "a"."name", "a"."create_date", "a"."state" . from above string.

Which comes between "SELECT" and "SUM(.*)" Any help appreciated.

I have tried below Regular Expression pattern

1) r'SELECT (.* ), [^(SUM(.* )]'

2) r'SELECT (.* ), SUM(.* )'

but both are not giving accurate result

Expected result:

 "a"."name", "a"."create_date", "a"."state"(No comma at the end)

Use:

(SELECT\s*)([^()]+)(,\s*SUM.*)

and use the second group with \\2 . Or replace groups \\1 and \\3 with nothing.

Test here .

You can use

(?:SELECT\s*)(.*?)(?:,\s*SUM.*)

to create a single capturing group.

The two (?:...) make non-capturing groups.
The (.*?) is a non-greedy group, which will stop before the FIRST "SUM" instead of the last one.

You can use:

sql = '''SELECT "a"."name", "a"."create_date", "a"."state", SUM("b"."cost") AS "amount", SUM("b"."cost") FILTER (WHERE "a"."state" = 'UNPAID') AS "paid", SUM("b"."cost") FILTER (WHERE "a"."state" = 'PAID') AS "unpaid" FROM "maintenance"'''

res = re.search(r'SELECT (.+?), SUM', sql)
print(res.group(1))

Output:

"a"."name", "a"."create_date", "a"."state"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM