[英]Python: How to find all matches in a multiline string but not proceeded by particular word?
我有SQL代码,我想在“插入”关键字之后提取表名。
基本上,我想使用以下规则进行提取:
例:
import re
lines = """begin insert into table_1 end
begin insert table_2 end
select 1 --This is will not insert into table_3
begin insert into
table_4
end
/* this is a comment
insert into table_5
*/
insert into table_6
"""
p = re.compile( r'^((?!--).)*\binsert\b\s+(?:into\s*)?.*', flags=re.IGNORECASE | re.MULTILINE)
for m in re.finditer( p, lines ):
line = lines[m.start(): m.end()].strip()
starts_with_insert = re.findall('insert.*', line, flags=re.IGNORECASE|re.MULTILINE|re.DOTALL)
print re.compile('insert\s+(?:into\s+)?', flags=re.IGNORECASE|re.MULTILINE|re.DOTALL).split(' '.join(starts_with_insert))[1].split()[0]
实际结果:
table_1
table_2
table_4
table_5
table_6
预期结果:由于table_5在/ *和* /之间,因此不应返回
table_1
table_2
table_4
table_6
有没有一种优雅的方法可以做到这一点?
提前致谢。
编辑 :感谢您的解决方案。 是否可以使用纯正则表达式而不从原始文本中删除行?
我想显示可从原始字符串中找到表名的行号。
更新后的代码如下:
import re
lines = """begin insert into table_1 end
begin insert table_2 end
select 1 --This is will not insert into table_3
begin insert into
table_4
end
/* this is a comment
insert into table_5
*/
insert into table_6
"""
p = re.compile( r'^((?!--).)*\binsert\b\s+(?:into\s*)?.*', flags=re.IGNORECASE | re.MULTILINE)
for m in re.finditer( p, lines ):
line = lines[m.start(): m.end()].strip()
line_no = str(lines.count("\n", 0, m.end()) + 1).zfill(6)
table_names = re.findall(r'(?:\binsert\s*(?:into\s*)?)(\S+)', line, flags=re.IGNORECASE|re.MULTILINE|re.DOTALL)
print '[line number: ' + line_no + '] ' + '; '.join(table_names)
尝试使用lookahead / lookbehind排除/ *和* /之间的那些,但未产生我的预期结果。
感谢您的帮助。 谢谢!
使用re.sub()
和re.findall()
函数re.findall()
:
# removing single line/multiline comments
stripped_lines = re.sub(r'/\*[\s\S]+\*/\s*|.*--.*(?=\binsert).*\n?', '', lines, re.S | re.I)
# extracting table names preceded by `insert` statement
tbl_names = re.findall(r'(?:\binsert\s*(?:into\s*)?)(\S+)', stripped_lines, re.I)
print(tbl_names)
输出:
['table_1', 'table_2', 'table_4', 'table_6']
import re
import string
lines = """begin insert into table_1 end
begin insert table_2 end
select 1 --This is will not insert into table_3
begin insert into
table_4
end
/* this is a comment
insert into table_5
*/
insert into table_6
"""
# remove all /* */ and -- comments
comments = re.compile('/\*(?:.*\n)+.*\*/|--.*?\n', flags=re.IGNORECASE | re.MULTILINE)
for comment in comments.findall(lines):
lines = string.replace(lines, comment, '')
fullSet = re.compile('insert\s+(?:into\s+)*(\S+)', flags=re.IGNORECASE | re.MULTILINE)
print fullSet.findall(lines)
给
['table_1', 'table_2', 'table_4', 'table_6']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.