简体   繁体   English

正则表达式匹配字符串不在括号内

[英]Regex to match string not inside parentheses

I been struggling to find a Regex that help me match 3 different strings only if they aren't inside parentheses, but so far I have only managed to match it if it's right next to the parentheses, and in this specific situation it doesn't suit me. 我一直在努力寻找一个正则表达式,仅当它们不在括号内时才能帮助我匹配3个不同的字符串,但是到目前为止,我仅设法将它匹配在括号旁边,并且在这种特定情况下,它不能匹配适合我。

To clarify I need to match the Strings "HAVING", "ORDER BY" and "GROUP BY" that aren't contained in any parentheses, no matter if the parentheses contains more than just the string. 为了澄清起见,我需要匹配任何括号中都不包含的字符串“ HAVING”,“ ORDER BY”和“ GROUP BY”,无论括号中包含的字符串不仅仅是字符串。

In that case: 在这种情况下:

Select *
from some_table
group by something;

Should match, but: 应该匹配,但是:

Select *
from(
   Select *
   from some_other_table
   group by something_else
)

or 要么

Select this, and_this
from(
   Select *
   from some_other_table
   having some_condition
)

shouldn't. 不应该。

I'm not an expert in Javascript Regex, so any help you could give me would be greatly appreciated. 我不是Javascript正则表达式的专家,因此,您能给我的任何帮助将不胜感激。

I assume you want to check whether a given SQL query contains HAVING , ORDER BY or GROUP BY at the top level (not within a subquery). 我假设您想检查给定的SQL查询是否在顶层包含HAVINGORDER BYGROUP BY (不在子查询中)。

This is complicated by the fact that both parens and words can be contained inside of string literals ( '...' ), quoted identifiers ( "..." ), and comments ( -- ... ). 括号和单词都可以包含在字符串文字( '...' ),带引号的标识符( "..." )和注释( -- ... )内,这使情况变得复杂。

In the following code I assume that that's all that can "go wrong" (ie there are no other quoting constructs) and that no quoted characters are special (in particular, \\ isn't treated any differently). 在下面的代码中,我假定这就是所有可能“出错”的东西(即没有其他引用结构),并且没有带引号的字符是特殊字符(尤其是\\不会有任何区别)。

Idea: 理念:

  • Remove all quoted constructs like string literals and comments. 删除所有引用的构造,例如字符串文字和注释。
  • Remove all parenthesized groups. 删除所有带括号的组。
  • Check the remaining string for your keywords. 检查其余的字符串作为关键字。

And by "remove" I mean "replace by a space" because otherwise there is the possibility of new tokens being created where there were none before (eg hav(...)IN"asdf"g would turn into havINg if parenthesized/quoted parts were just replaced by nothing). “删除”一词的意思是“用空格代替”,因为否则可能会在以前没有的地方创建新令牌(例如,如果用括号/引号将hav(...)IN"asdf"g转换为havINg ,零件只被替换为空)。

Implementation: 执行:

 function contains_groupy_bits(sql) { sql = sql.replace(/'[^']*'|"[^"]*"|--[^\\n]*/g, ' '); let tmp; while ((tmp = sql.replace(/\\([^()]*\\)/g, ' ')) !== sql) { sql = tmp; } return /\\b(?:having|order\\s+by|group\\s+by)\\b/i.test(sql); } const examples = [ `Select * from some_table group by something;`, `Select * from( Select * from some_other_table group by something_else )`, `Select this, and_this from( Select * from some_other_table having some_condition )`, `select name, count(*) from things where mark = '(' group by name -- )`, ]; for (const ex of examples) { console.log("'" + ex + "': " + contains_groupy_bits(ex)); } 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM