简体   繁体   English

Perl Regex匹配未用引号括起来的字符串

[英]Perl Regex to match a string that is not enclosed in quotes

I'm trying to write regex to match a string that is NOT inside quotes (double or single), but the best I can do so far is a loop to iterate through all characters of a string. 我正在尝试编写正则表达式以匹配不在引号内(双或单)的字符串,但到目前为止我能做的最好的是循环遍历字符串的所有字符。 There must be a simpler, more elegant solution. 必须有一个更简单,更优雅的解决方案。

Examples: If trying to replace foo with bar the string hello foo! 例如:如果想取代foobar字符串hello foo! would become hello bar! 会成为你hello bar! , but the string you said "my name is foo" would stay the same. ,但you said "my name is foo"的字符串将保持不变。

Could anyone help out with a regexp to achieve the above? 任何人都可以通过正则表达式来帮助实现上述目标吗?

One way, using a negative lookahead: 一种方法,使用否定前瞻:

perl -lane 's/foo(?![^"]*"(?:[^"]*"[^"]*")*[^"]*$)/bar/g; print' input

which means substitute if the number of quotes ahead is not odd. 这意味着如果前面的报价数量不是奇数,则替代。 So this one assumes you have balanced quotes in the input. 所以这个假设你在输入中有平衡的引号。

Example input: 输入示例:

hello foo!
"foo" foo "foo"
foo "hello" foo
"foo" bar

Example output: 示例输出:

hello bar!
"foo" bar "foo"
bar "hello" bar
"foo" bar

Update: Quick-Summary: While you'd need "Balancing Groups" to really handle this, the short answer is, you can't do it if you require Single-Quotes too. 更新:快速摘要:虽然您需要“平衡组”才能真正解决这个问题,但简短的回答是,如果您还需要单引号,则无法执行此操作。 Because those double as Apostrophes. 因为那些双重作为Apostrophes。 So no matter what, this would really screw you up: That's when foo said, "That's my line!" 所以无论如何,这真的会让你失望: That's when foo said, "That's my line!" balancing gets very thrown out of whack with apostrophes. 平衡得到了与撇号的重击。 You'll need to build a custom parsing engine. 您需要构建自定义解析引擎。

Note: If this is for HTML Properties... I have written a Regex that properly parses them exactly as you say, and I believe would work in Perl. 注意:如果这是针对HTML属性的......我已经编写了一个正如你所说的那样正确解析它们的正则表达式,我相信它可以在Perl中运行。 But that also relies on delimiters like the = sign and other HTML Structures. 但这也依赖于=符号和其他HTML结构之类的分隔符。 But in 90% of those cases, an XML/HTML Parser is the best option (That 10% is still a possibility). 但在90%的情况下,XML / HTML Parser是最佳选择(10%仍然是可能的)。

As I mentioned in my comment to your question, more examples would give more concrete answers. 正如我在对你的问题的评论中提到的,更多的例子会给出更具体的答案。 This is the answer for your limited example: 这是您有限的例子的答案:

^([^"']*?)foo([^"']*)$

Lookarounds are easy for an intermediate regex writer, but complicated for code maintainance and usually not what was needed. 对于中间正则表达式编写者来说,外观很容易,但代码维护很复杂,而且通常不需要。 Also, anything that requires you to use a Dot . 此外,任何需要你使用Dot的东西. in regex is typically not as efficient as it could be. 在正则表达式中通常不如它有效。

Replace my example with $1bar$2 and you'll be golden. $1bar$2替换我的例子,你将是金色的。 But again, as my comment says, this is based on your basic example which assumes your entire string is that which may start with and end with quotes. 但同样,正如我的评论所说,这是基于您的基本示例,该示例假定您的整个字符串可以以引号开头并以引号结束。 If you have different examples they would help. 如果你有不同的例子,他们会帮助你。

Addition 加成

Just for fun, I'm going to answer your question for two other options. 只是为了好玩,我将为另外两个选项回答你的问题。 Option 1 is my original answer above. 选项1是我上面的原始答案。

Option 2 (As mentioned by Floris): 备选方案2 (如Floris所述):

Hi foo, I said "hello"

Or 要么

"hello", said foo to his friend.

If this is the case, where quoted text will only appear BEFORE or AFTER your search-text (foo in this case), then the answer is like so: 如果是这种情况,引用的文本只会出现在搜索文本之前或之后(在这种情况下为foo),那么答案是这样的:

^(?:([^"']*?)foo(.*)|(.*?)foo([^"']*))$

Option 3 (as seen in my comment below) 选项3 (见下面的评论)

He said, "Hello", so then Foo told him, "Lawl, bye"

To do this, we'd have to count the number of Quotes before and after foo, to be sure they are either even, or that they "close out" known as "Balancing" in .NET Regex, neither of which options are available in your circumstance without some other custom functions. 要做到这一点,我们必须计算foo之前和之后的报价数量,以确保它们是偶数,或者它们在.NET Regex中“关闭”称为“平衡”,这些选项都不可用在您的情况下没有其他一些自定义功能。

Needed to do this as well, so solved it myself... This solution does not rely on balanced quotes, but obviously will not support apostrophes if they come in pairs. 也需要这样做,所以我自己解决了......这个解决方案并不依赖于均衡的引号,但如果它们成对出现,显然不会支持撇号。

#!/usr/bin/perl

my @test = ( 'hello foo!',
             '"my name is foo"',
             'foo test "test foo test" test foo test "test foo test" test foo',
             "foo test 'test foo test' test foo test 'test foo test' test foo",
             '"foo test foo"',
             'foo test " foo test' );

foreach ( @test )
{
  s!("[^"]*"|'[^']*')|foo!$1//'bar'!ge;
  print "$_\n";
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM