I'd like to use regex to search for lines starting with certain characters within substrings. I have a SQL string -
qry = '''
with
qry_1 as ( -- some text
SELECT ID,
NAME
FROM ( ... other code...
),
qry_2 as (
SELECT coalesce (table1.ID, table2.ID) as ID,
NAME
FROM (...other code...
),
qry_3 as (
-- some text
SELECT id.WEATHER AS WEATHER_MORN,
ROW_NUMBER() OVER(PARTITION BY id.SUN
ORDER BY id.TIME) AS SUN_TIME,
id.RAIN,
id.MIST
FROM (...other code..
-- some other text
)
'''
I'm able to extract subquery information through re.findall
here -
sub = re.findall(r'' '(.+?) (?i)as \(',qry)
Where sub
output is qry_1, qry_2, qry_3
And I'd like to be able to extract any lines starting with this character --
within those identified in sub
. Something like this works for string values that I got help with here -
# search substring between strings
params = [re.findall('^\w+|(?:--)|(?<=\.)(?:--)', i)
for i in re.findall('\w+\s(?i)as\s\([\s\w\.,\n]+', qry)]
dict_result = {a:None if not b else b for a, *b in params}
dict_result = dict([(k,dict_result[k]) for k in sub])
dict_result
But how to incorporate the starts with
special character --
? So the output is like this -
{'qry_1' : 'some text', 'qry_2': 'None', 'qry_3': 'some text, some other text'}
Thank you for guidance here
For the example data, one option could be using a capture group for all the parts before as (
in group 1, and capture all lines after it in group 2 that do not contain as (
.
^(.+?) as \((.*(?:\n(?!.* as \().*)*)\n\)
^
Start of string (.+?)
Capture group 1 as \(
Match as (
(
Capture group 2
.*
Match the rest of the line (?:\n(?..* as \().*)*
)
Close group 1 \n\)
Match a newline and )
Then you could use group 1 as the key of the dict, and use re.findall using the value of group 2 to find the strings that start with --
and capture what follows that again in a capture group, which will be returned by re.findall.
import re
regex = r"^(.+?) as \((.*(?:\n(?!.* as \().*)*)\n\)"
dict_result = {}
s = "the example string here"
for tup in re.findall(regex, s, re.MULTILINE):
matches = re.findall(r"-- (.*)", tup[1])
dict_result[tup[0]] = matches if len(matches) > 0 else None
print(dict_result)
Output
{'qry_1': ['some text'], 'qry_2': None, 'qry_3': ['some text', 'some other text']}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.