简体   繁体   English

PCRE正则表达式,从包含具有不同分隔符和条带注释的多个语句的字符串中提取单个SQL语句

[英]PCRE regex, extract single SQL statement from string containing multiple statements with varying delimiters and strip comments

i am trying to use regular expressions to extract singular sql statements from a file containing several sql statements and alternate delimiters/comments. 我试图使用正则表达式从包含几个sql语句和备用分隔符/注释的文件中提取单个sql语句。

i am trying to match the following patterns to isolate sql statements, then after isolating an individual statement, stripping it of comments: "delimiter (del) (nonwhitespace sequence) (not (del) or comment with (del)) (del)" "(not ; ) ;" 我试图匹配以下模式来隔离sql语句,然后隔离一个单独的语句,剥离它的注释:“delimiter(del)(非空白序列)(不(del)或注释与(del))(del)” “(不;);”

the first pattern should allow the use of any set of characters for a delimiter 第一个模式应允许使用任何字符集作为分隔符


i tried the following to match the first pattern: 我尝试了以下匹配第一个模式:

"/\s*delimiter\s+(?<d>[^\s]+)\s*;?\s*(?<qstr>(((?!--|\g{d}).)+|--[^\R]*\R)*)\g{d}\s*;?/s"

and if the first pattern fails, to match the second pattern: 如果第一个模式失败,则匹配第二个模式:

"/\s*(?<qstr>(((?!--|;).)+|--[^\R]*\R)*);/s"

then if either succeeds, replace the following with empty string: 然后如果成功,请用空字符串替换以下内容:

"/--[^\n\r]*(?:\n|\r)*/"

my problem is that apache crashes on preg_match when i try to search for either of the first 2 regular expressions on the following string: 我的问题是,当我尝试在以下字符串中搜索前2个正则表达式中的任何一个时,apache在preg_match上崩溃:

"delimiter $$
create table MovieDetail
(
   imdbid varchar(32) primary key not null,
   title varchar(512),
   year int,
   rated varchar(16),
   released int,
   runtime int,
   director varchar(128),
   writer varchar(12),
   plot varchar(2048),
   imageurl varchar(512),
   rating float,
   ratingcount int,
   type varchar(64)
); $$
detect this text as a separate statement"

the first match should be 第一场比赛应该是

"delimiter $$
create table MovieDetail
(
   imdbid varchar(32) primary key not null,
   title varchar(512),
   year int,
   rated varchar(16),
   released int,
   runtime int,
   director varchar(128),
   writer varchar(12),
   plot varchar(2048),
   imageurl varchar(512),
   rating float,
   ratingcount int,
   type varchar(64)
); $$"

and the subpattern <qstr> should be 和子模式<qstr>应该是

"create table MovieDetail
    (
       imdbid varchar(32) primary key not null,
       title varchar(512),
       year int,
       rated varchar(16),
       released int,
       runtime int,
       director varchar(128),
       writer varchar(12),
       plot varchar(2048),
       imageurl varchar(512),
       rating float,
       ratingcount int,
       type varchar(64)
    )"

The goal is to extract the first sql statement in a string containing multiple sql statements. 目标是在包含多个sql语句的字符串中提取第一个sql语句。 it can then determine a new index in the string after accounting for the extracted statement and proceed to extract the next sql statement from that index. 然后,它可以在考虑提取的语句后确定字符串中的新索引,然后从该索引中提取下一个sql语句。 the goal is to allow my script to execute individual sql statements from a string containing multiple sql statements so that it can print the individual results for each statement (failure/success/fetched results from a query if any). 目标是允许我的脚本从包含多个sql语句的字符串执行单独的sql语句,以便它可以打印每个语句的单独结果(如果有的话,查询失败/成功/获取结果)。 delimiter is not part of sql and i need it to enable my script to define triggers or sql stored programs which contain multiple sql statements but need to be treated as one. delimiter不是sql的一部分,我需要它来启用我的脚本来定义包含多个sql语句但需要被视为一个的触发器或sql存储程序。

i tried replacing escape sequences with // like //s and //g and it still crashes just the same. 我尝试用// s和// g替换转义序列,但它仍然崩溃。 i tried testing them on debugex.com and both expressions are valid. 我尝试在debugex.com上测试它们,两个表达式都有效。 i'm using XAMPP with Apache 2.4.17 and PHP 5.6.23 (VC11 X86 32bit thread safe) + PEAR. 我正在使用XAMPP与Apache 2.4.17和PHP 5.6.23(VC11 X86 32位线程安全)+ PEAR。

major update: i found out the error only occurs when running the regex expression on multi-line strings, so i'm going to try comparing the binary data of the string with that of one replacing line breaks by \\n or \\r\\n 主要更新:我发现只有在多行字符串上运行正则表达式时才会出现错误,因此我将尝试将字符串的二进制数据与一个替换换行符的二进制数据进行比较\\ n或\\ r \\ n

i also realized the regex expressions above don't account for sql string-expressions, so the updated regex's are 我也意识到上面的正则表达式不考虑sql字符串表达式,所以更新的正则表达式是

"/\s*delimiter\s+(?<d>[^\s]+)\s*;?\s*(?<qstr>(((?!--|\g{d})[^'])+|'([^']|'')*'|--[^\R]*\R)*)\g{d}\s*;?/sA"

and

"/\s*(?<qstr>((?!--|;)[^']|'([^']|'')*'|--[^\R]*\R)*);/sA"

the crash is related to multiple whitespace characters. 崩溃与多个空白字符有关。 if i do 如果我做

preg_replace("/\s+/", " ", $sqlstr);

the crash is eliminated and it functions properly. 崩溃消除了,它运作正常。 also, if i do 如果我这样做的话

preg_replace("/[ ]+/", " ", preg_replace("/\R+/", "\n", $sqlstr));

the crash is also eliminated. 崩溃也被消除了。 i can use that as a work-around but i don't like it because it doesn't preserve the original string. 我可以使用它作为解决方法,但我不喜欢它,因为它不保留原始字符串。 it also might cause problems extracting subsequent sql statements since the index will correspond to the modified not the original string. 它也可能导致提取后续sql语句的问题,因为索引将对应于修改而不是原始字符串。

I found out the source of the crash was the use of nested and repeated captured subpatterns, but even beyond that, the expressions did not work. 我发现崩溃的根源是使用嵌套和重复捕获的子模式,但即使超出这个范围,表达式也不起作用。 i eventually gave up and resorted to manual character processing. 我最终放弃并采用手动字符处理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM