简体   繁体   English

正则表达式选择未用双引号括起来的分号

[英]Regex to select semicolons that are not enclosed in double quotes

I have string like 我有类似的字符串

a;b;"aaa;;;bccc";deef

I want to split string based on delimiter ; 我想根据分隔符拆分字符串; only if ; 只有; is not inside double quotes. 不在双引号内。 So after the split, it will be 所以在分裂之后,它将是

 a
 b
"aaa;;;bccc"
 deef

I tried using look-behind, but I'm not able to find a correct regular expression for splitting. 我尝试使用look-behind,但是我无法找到正确的正则表达式来进行拆分。

Regular expressions are probably not the right tool for this. 正则表达式可能不是正确的工具。 If possible you should use a CSV library, specify ; 如果可能,您应该使用CSV库,指定; as the delimiter and " as the quote character, this should give you the exact fields you are looking for. 作为分隔符和"作为引用字符,这应该为您提供您正在寻找的确切字段。

That being said here is one approach that works by ensuring that there are an even number of quotation marks between the ; 这里所说的是一种方法,通过确保在;之间存在偶数个引号来起作用; we are considering the split at and the end of the string. 我们正在考虑字符串的分割和结束。

;(?=(([^"]*"){2})*[^"]*$)

Example: http://www.rubular.com/r/RyLQyR8F19 示例: http//www.rubular.com/r/RyLQyR8F19

This will break down if you can have escaped quotation marks within a string, for example a;"foo\\"bar";c . 如果您可以在字符串中转义引号,例如a;"foo\\"bar";c

Here is a much cleaner example using Python's csv module : 这是使用Python的csv模块的一个更清晰的例子:

import csv, StringIO
reader = csv.reader(StringIO.StringIO('a;b;"aaa;;;bccc";deef'),
                    delimiter=';', quotechar='"')
for row in reader:
    print '\n'.join(row)

This is kind of ugly, but if you don't have \\" inside your quoted strings (meaning you don't have strings that look like this ("foo bar \\"badoo\\" goo") you can split on the " first and then assume that all your even numbered array elements are, in fact, strings (and split the odd numbered elements into their component parts on the ; token). 这有点难看,但如果你没有“在你引用的字符串里面(意思是你没有看起来像这样的字符串(”foo bar \\“badoo \\”goo“)你就可以分开”然后假设所有偶数数组元素实际上都是字符串(并将奇数元素拆分为;令牌上的组成部分)。

If you *do have \\" in your strings, then you'll want to first convert those into some other temporary token that you'll convert back later after you've performed your operation. 如果你的字符串中有“\\”,那么你首先需要将它们转换为其他临时令牌,你将在执行操作后转换回来。

Here's a fiddle... 这是一个小提琴......

http://jsfiddle.net/VW9an/ http://jsfiddle.net/VW9an/

    var str = 'abc;def;ghi"some other dogs say \\"bow; wow; wow\\". yes they do!"and another; and a fifth'

var strCp = str.replace(/\\"/g,"--##--");

var parts = strCp.split(/"/);

var allPieces = new Array();
for(var i in parts){
    if(i % 2 == 0){
        var innerParts = parts[i].split(/\;/)
        for(var j in innerParts)
            allPieces.push(innerParts[j])
    }
    else{
        allPieces.push('"' + parts[i] +'"')
    }
}

for(var a in allPieces){
 allPieces[a] = allPieces[a].replace(/--##--/g,'\\"');   
}

console.log(allPieces)

Regular expression will only get messier and break on even minor changes. 正则表达式只会变得更加混乱,即使是微小的变化也会中断。 You are better off using a csv parser with any scripting language. 您最好使用任何脚本语言的csv解析器。 Perl built in module (so you don't need to download from CPAN if there are any restrictions) called Text::ParseWords allows you to specify the delimiter so that you are not limited to , . Perl内置模块(因此您无需从CPAN下载,如果有任何限制),名为Text :: ParseWords允许您指定分隔符,以便您不限于, Here is a sample snippet: 这是一个示例代码段:

#!/usr/local/bin/perl

use strict;
use warnings;

use Text::ParseWords;

my $string = 'a;b;"aaa;;;bccc";deef';
my @ary = parse_line(q{;}, 0, $string);

print "$_\n" for @ary;

Output 产量

a
b
aaa;;;bccc
deef

Match All instead of Splitting 匹配所有而不是拆分

Answering long after the battle because no one used the way that seems the simplest to me. 在战斗结束后很久就回答,因为没有人使用对我来说最简单的方式。

Once you understand that Match All and Split are Two Sides of the Same Coin , you can use this simple regex: 一旦你理解了Match All和Split是同一枚硬币的两面 ,你可以使用这个简单的正则表达式:

"[^"]*"|[^";]+

See the matches in the Regex Demo . 查看Regex演示中的匹配项。

  • The left side of the alternation | 交替的左侧| matches full quoted strings 匹配完整的引用字符串
  • The right side matches any chars that are neither ; 右侧匹配任何两个都不匹配的字符; nor " 也不"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM