简体   繁体   English

javascript / regex忽略双引号中的分号

[英]javascript/regex to ignore semicolons in double quotes

I've been stumped for bit on this one - I have a string that is almost a semicolon delimited string it would be something like this: 我在这一点上一点都不知道 - 我有一个字符串几乎是一个分号分隔的字符串,它将是这样的:

one; 一; two; 二; three "four; five;six"; 三“四;五;六”; seven

I'd like to split this up using a regex in javascript into an array like this (eg ignoring any semicolons inside double quotes): 我想在javascript中使用正则表达式将其拆分为这样的数组(例如,忽略双引号内的任何分号):

['one','two','three "four; five;six"','seven'] ['一二三四五六七']

I've tried adapting known working CSV functions, but they seem to be able to be adapted to work with the third element ('three "four;five;six";'). 我已经尝试调整已知的工作CSV函数,但它们似乎能够适应第三个元素(“三个”四个;五个;六个“;”)。

It seems like a regex type of problem, but if a solution exists using more than regex, I'm certainly interested! 它似乎是一个正则表达式的问题,但如果使用超过正则表达式的解决方案存在,我当然感兴趣!

update : I should also note that there may be spaces before or after the semicolons in the quoted string. 更新 :我还应该注意,在带引号的字符串中分号之前或之后可能有空格。 I've updated the example to reflect that. 我已经更新了这个例子来反映这一点。

Assuming you don't allow for escaped quotes inside your quotes (eg "this has \\"escaped quotes\\" inside" ) then this should work: 假设您不允许引号内的转义引号(例如"this has \\"escaped quotes\\" inside" )那么这应该有效:

var rx = /(?!;|$)[^;"]*(("[^"]*")[^;"]*)*/g;
var str = 'one; two; three "four;five;six"; seven';
var res = str.match(rx)
// res = ['one', ' two', ' three "four;five;six"', ' seven']

Note that you need the negative-lookahead (?!;|$) at the beginning of the regex to keep it from matching the empty string, otherwise the match method matches empty strings in front of each of the semicolons for some reason. 请注意,在正则表达式的开头需要负向前瞻 (?!;|$)以防止它与空字符串匹配,否则match方法由于某种原因匹配每个分号前面的空字符串。

Update: 更新:

I think this regular expression should work with escaped quotes as well (although I'd appreciate feedback on the correctness). 我认为这个正则表达式也应该与转义引号一起使用(尽管我很欣赏有关正确性的反馈)。 I've also added the extra \\s in the negative-lookahead pattern to strip off whitespace after the preceding semicolon. 我还在负前瞻模式中添加了额外的\\s以在前面的分号后去掉空格。

/(?!\s|;|$)[^;"]*("(\\.|[^\\"])*"[^;"]*)*/g

This strips spaces before and after semicolons: 这会在分号前后删除空格:

'one; two; three "four;five;six"; seven'.match(/(?!;| |$)([^";]*"[^"]*")*([^";]*[^ ";])?/g)

['one', 'two', 'three";four;five;six"', 'seven']

'one ; two"; three ; "four" ; five ; "six ; seven'.match(/(?!;| |$)([^";]*"[^"]*")*([^";]*[^ ";])?/g)

['one', 'two" ; three ; "four" ; five ; "six', 'seven']

It doesn't try to deal with escaped quotes though. 尽管如此,它并没有试图处理逃脱的报价。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM