Python 正则表达式搜索以子字符串中某些字符开头的行

Question

我想使用正则表达式来搜索以子字符串中某些字符开头的行。 我有一个 SQL 字符串 -

qry = ''' 
with 
qry_1 as ( -- some text
   SELECT ID, 
          NAME
   FROM   ( ... other code...
),
qry_2 as ( 
    SELECT coalesce (table1.ID, table2.ID) as ID,
           NAME
   FROM (...other code...
),
qry_3 as (
-- some text
     SELECT id.WEATHER AS WEATHER_MORN,
            ROW_NUMBER() OVER(PARTITION BY id.SUN
                ORDER BY id.TIME) AS SUN_TIME,
            id.RAIN,
            id.MIST
   FROM (...other code..
-- some other text
)
'''

我可以在这里通过re.findall提取子查询信息 -

sub = re.findall(r'' '(.+?) (?i)as \(',qry)

其中sub output 是qry_1, qry_2, qry_3我希望能够提取以该字符开头的任何行--在sub中标识的行内。 像这样的东西适用于我在这里得到帮助的字符串值 -

# search substring between strings 
params = [re.findall('^\w+|(?:--)|(?<=\.)(?:--)', i) 
     for i in re.findall('\w+\s(?i)as\s\([\s\w\.,\n]+', qry)]
dict_result = {a:None if not b else b for a, *b in params}

dict_result = dict([(k,dict_result[k]) for k in sub])
dict_result

但是如何将starts with -- ？ 所以output是这样的——

{'qry_1' : 'some text', 'qry_2': 'None', 'qry_3': 'some text, some other text'}

在此感谢您的指导

Answer 1

对于示例数据，一种选择可能是对第 1 组中as (之前的所有部分使用捕获组，并在第 2 组中捕获它之后不包含as (的所有行。

^(.+?) as \((.*(?:\n(?!.* as \().*)*)\n\)

^字符串开头
(.+?)捕获组 1
as \(匹配as (
(捕获组 2
- .*匹配线的rest
- (?:\n(?..* as \().*)*
)关闭第 1 组
\n\)匹配换行符和)

然后你可以使用组 1 作为 dict 的键，并使用 re.findall 使用组 2 的值来查找以--开头的字符串，并在捕获组中再次捕获后面的内容，这将由 re 返回。找到所有。

import re

regex = r"^(.+?) as \((.*(?:\n(?!.* as \().*)*)\n\)"
dict_result = {}
s = "the example string here"

for tup in re.findall(regex, s, re.MULTILINE):
    matches = re.findall(r"-- (.*)", tup[1])
    dict_result[tup[0]] = matches if len(matches) > 0 else None

print(dict_result)

Output

{'qry_1': ['some text'], 'qry_2': None, 'qry_3': ['some text', 'some other text']}

正则表达式演示| Python 演示

Python 正则表达式搜索以子字符串中某些字符开头的行

问题描述

1 个解决方案

解决方案1
0 2021-04-25 18:26:51

Python 正则表达式搜索以子字符串中某些字符开头的行

问题描述

1 个解决方案

解决方案1 0 2021-04-25 18:26:51

解决方案1
0 2021-04-25 18:26:51