简体   繁体   English

在正则表达式中引用引号之间的文本

[英]Skip text between quotes in Regex

I have the following Regex using IgnoreCase and Multiline in .NET: 我在.NET中使用IgnoreCaseMultiline有以下正则表达式:

^.*?\WtextToFind\W.*?$

Given a multiline input like: 给出如下的多行输入:

1 Some Random Text textToFind
2 Some more "textToFind" random text 
3 Another textToFinddd random text

The current regular expression matches with the lines 1 and 2. However I need to skip all the lines which textToFind is inside quotes and double quotes. 当前正则表达式与第1行和第2行匹配。但是我需要跳过textToFind在引号和双引号内的所有行。

Any ideas how to achieve this? 任何想法如何实现这一目标?

Thanks! 谢谢!

EDIT: 编辑:

Explanation: My purpose is to find some method calls inside VBScript code. 说明:我的目的是在VBScript代码中找到一些方法调用。 I thought this would be irrelevant for my question, but after reading the comments I realised I should explain this. 我认为这与我的问题无关,但在阅读完评论后,我意识到我应该解释一下。

So basically I want to skip text that is between quotes or single quotes and all the text that is between a quote and the end of line since that would be a comment in VBScript: If I'm looking for myFunc 所以基本上我想跳过引号或单引号之间的文本以及引号和行尾之间的所有文本,因为这将是VBScript中的注释:如果我正在寻找myFunc

Call myFunc("parameter") // should match
Call anotherFunc("myFunc") //should not match
Call someFunc("parameter") 'Comment myFunc //should not match
If(myFunc("parameter") And someFunc("myFunc")) //should match

With all of the possible cases involving mixed sets of quotes, a regex may not be your best option here. 由于所有可能涉及混合报价的案例,正则表达式可能不是您的最佳选择。 What you could do instead (after using your current regex to filter for everything but quotes), is count the number of quotes before and after the occurrence of textToFind . 您可以做什么(在使用当前的正则表达式来过滤除引号之外的所有内容之后),计算textToFind出现之前和之后的引号数。 If both counts are odd, then you have quotes around your keyword and should scrap the line. 如果两个计数都是奇数,那么您的关键字周围有引号,应该废弃该行。 If both are even, you've got matched quotes elsewhere (or no quotes at all), and should keep the line. 如果两者都是偶数,那么你在其他地方有匹配的引号(或根本没有引号),并且应该保持该行。 Then repeat the process for double quotes. 然后重复双引号的过程。 You could do all this only walking through the string once. 你可以完成所有这一切只能穿过字符串一次。

Edit to address the update that you're searching through code: There are some additional considerations to take into account. 编辑以解决您正在搜索代码的更新:需要考虑一些其他注意事项。

  • Escaped quotes (skip over the character after an escape character, and it won't be counted). 转义引号(在转义字符后跳过字符,不会计算)。
  • Commented quotes, eg /* " */ in the middle of a line. When you hit a /* , just jump to the next occurrence of */ and then continue inspecting characters. You may also want to check whether the occurrence of textToFind is in a comment. 注释引号,例如/* " */在一行中间。当你点击/* ,只需跳转到下一个*/然后继续检查字符。你可能还想检查textToFind的出现是否是在评论中。
  • End-of-line ' quotes - if it occurs (outside a literal string) before the keyword, it's not a valid method call. 行尾'引号” - 如果在关键字之前发生(在文字字符串之外),则它不是有效的方法调用。

The bottom line is still that regexes aren't the droids you're looking for, here. 最重要的是,正则表达式不是你正在寻找的机器人。 You're better off walking through lines and parsing them. 你最好穿过线条并解析它们。

It seems like this should work for your actual implementation in all the examples you've given: 看起来这应该适用于您给出的所有示例中的实际实现:

/\bmyFunc\(/

Demonstration - view console. 演示 - 视图控制台。

as long as you don't have something like "i'm going to call myFunc()" , but if you start trying to deal with quotes, multiple quotes, nested quotes, etc... it will get very messy (like trying to parse dom with regex) . 只要你没有"i'm going to call myFunc()" ,但是如果你开始尝试处理引号,多引号,嵌套引号等等......它会变得非常混乱(比如尝试用正则表达式解析dom)

Also, it appears that you are checking within vbscript code. 此外,您似乎正在检查vbscript代码。 Comments in vbscript code start with an ' , right? vbscript代码中的注释以'开头,对吧? You could check this as well, as it looks like you are doing this on a line by line basis, this should work for those type of comments: 您也可以检查这一点,因为看起来您是逐行执行此操作,这应该适用于这些类型的注释:

/^\s*[^'].*\bmyFunc\(/

Demo 演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM