简体   繁体   English

用逗号分割字符串,但使用 Javascript 忽略双引号内的逗号

[英]Split a string by commas but ignore commas within double-quotes using Javascript

I'm looking for [a, b, c, "d, e, f", g, h] to turn into an array of 6 elements: a, b, c, "d,e,f", g, h.我正在寻找[a, b, c, "d, e, f", g, h]变成一个包含 6 个元素的数组:a, b, c, "d,e,f", g, h . I'm trying to do this through Javascript.我正在尝试通过 Javascript 来做到这一点。 This is what I have so far:这是我到目前为止所拥有的:

str = str.split(/,+|"[^"]+"/g); 

But right now it's splitting out everything that's in the double-quotes, which is incorrect.但是现在它正在拆分双引号中的所有内容,这是不正确的。

Edit: Okay sorry I worded this question really poorly.编辑:好的,对不起,我对这个问题的措辞非常糟糕。 I'm being given a string not an array.我得到的是一个字符串而不是一个数组。

var str = 'a, b, c, "d, e, f", g, h';

And I want to turn that into an array using something like the "split" function.我想用类似“split”函数的东西把变成一个数组。

Here's what I would do.这就是我要做的。

var str = 'a, b, c, "d, e, f", g, h';
var arr = str.match(/(".*?"|[^",\s]+)(?=\s*,|\s*$)/g);

在此处输入图像描述 /* will match: /* 将匹配:

    (
        ".*?"       double quotes + anything but double quotes + double quotes
        |           OR
        [^",\s]+    1 or more characters excl. double quotes, comma or spaces of any kind
    )
    (?=             FOLLOWED BY
        \s*,        0 or more empty spaces and a comma
        |           OR
        \s*$        0 or more empty spaces and nothing else (end of string)
    )
    
*/
arr = arr || [];
// this will prevent JS from throwing an error in
// the below loop when there are no matches
for (var i = 0; i < arr.length; i++) console.log('arr['+i+'] =',arr[i]);

regex: /,(?=(?:(?:[^"]*"){2})*[^"]*$)/正则表达式: /,(?=(?:(?:[^"]*"){2})*[^"]*$)/

在此处输入图像描述

const input_line = '"2C95699FFC68","201 S BOULEVARDRICHMOND, VA 23220","8299600062754882","2018-09-23"'

let my_split = input_line.split(/,(?=(?:(?:[^"]*"){2})*[^"]*$)/)[4]

Output: 
my_split[0]: "2C95699FFC68", 
my_split[1]: "201 S BOULEVARDRICHMOND, VA 23220", 
my_split[2]: "8299600062754882", 
my_split[3]: "2018-09-23"

Reference following link for an explanation: regexr.com/44u6o参考以下链接进行解释: regexr.com/44u6o

Here is a JavaScript function to do it:这是一个 JavaScript 函数:

function splitCSVButIgnoreCommasInDoublequotes(str) {  
    //split the str first  
    //then merge the elments between two double quotes  
    var delimiter = ',';  
    var quotes = '"';  
    var elements = str.split(delimiter);  
    var newElements = [];  
    for (var i = 0; i < elements.length; ++i) {  
        if (elements[i].indexOf(quotes) >= 0) {//the left double quotes is found  
            var indexOfRightQuotes = -1;  
            var tmp = elements[i];  
            //find the right double quotes  
            for (var j = i + 1; j < elements.length; ++j) {  
                if (elements[j].indexOf(quotes) >= 0) {  
                    indexOfRightQuotes = j; 
                    break;
                }  
            }  
            //found the right double quotes  
            //merge all the elements between double quotes  
            if (-1 != indexOfRightQuotes) {   
                for (var j = i + 1; j <= indexOfRightQuotes; ++j) {  
                    tmp = tmp + delimiter + elements[j];  
                }  
                newElements.push(tmp);  
                i = indexOfRightQuotes;  
            }  
            else { //right double quotes is not found  
                newElements.push(elements[i]);  
            }  
        }  
        else {//no left double quotes is found  
            newElements.push(elements[i]);  
        }  
    }  

    return newElements;  
}  

Here's a non-regex one that assumes doublequotes will come in pairs:这是一个假设双引号成对出现的非正则表达式:

 function splitCsv(str) { return str.split(',').reduce((accum,curr)=>{ if(accum.isConcatting) { accum.soFar[accum.soFar.length-1] += ','+curr } else { accum.soFar.push(curr) } if(curr.split('"').length % 2 == 0) { accum.isConcatting= !accum.isConcatting } return accum; },{soFar:[],isConcatting:false}).soFar } console.log(splitCsv('asdf,"a,d",fdsa'),' should be ',['asdf','"a,d"','fdsa']) console.log(splitCsv(',asdf,,fds,'),' should be ',['','asdf','','fds','']) console.log(splitCsv('asdf,"a,,,d",fdsa'),' should be ',['asdf','"a,,,d"','fdsa'])

This works well for me.这对我很有效。 (I used semicolons so the alert message would show the difference between commas added when turning the array into a string and the actual captured values.) (我使用了分号,因此警报消息会显示将数组转换为字符串时添加的逗号与实际捕获的值之间的差异。)

REGEX正则表达式

/("[^"]*")|[^;]+/

在此处输入图像描述

var str = 'a; b; c; "d; e; f"; g; h; "i"';
var array = str.match(/("[^"]*")|[^;]+/g); 
alert(array);

Here's the regex we're using to extract valid arguments from a comma-separated argument list, supporting double-quoted arguments.这是我们用来从逗号分隔的参数列表中提取有效参数的正则表达式,支持双引号参数。 It works for the outlined edge cases.它适用于概述的边缘情况。 Eg例如

  • doesn't include quotes in the matches在匹配项中不包含引号
  • works with white spaces in matches在匹配中使用空格
  • works with empty fields适用于空字段

(?<=")[^"]+?(?="(?:\s*?,|\s*?$))|(?<=(?:^|,)\s*?)(?:[^,"\s][^,"]*[^,"\s])|(?:[^,"\s])(?![^"]*?"(?:\s*?,|\s*?$))(?=\s*?(?:,|$))

Proof: https://regex101.com/r/UL8kyy/3/tests ( Note: currently only works in Chrome because the regex uses lookbehinds which are only supported in ECMA2018 )证明: https ://regex101.com/r/UL8kyy/3/tests (注意:目前仅适用于 Chrome,因为正则表达式使用仅在 ECMA2018 中支持的后视

According to our guidelines it avoids non-capturing groups and greedy matching.根据我们的指南,它避免了非捕获组和贪婪匹配。

I'm sure it can be simplified, I'm open to suggestions / additional test cases.我确信它可以简化,我愿意接受建议/额外的测试用例。

For anyone interested, the first part matches double-quoted, comma-delimited arguments:对于任何感兴趣的人,第一部分匹配双引号,逗号分隔的参数:

(?<=")[^"]+?(?="(?:\s*?,|\s*?$))

And the second part matches comma-delimited arguments by themselves:第二部分自己匹配逗号分隔的参数:

(?<=(?:^|,)\s*?)(?:[^,"\s][^,"]*[^,"\s])|(?:[^,"\s])(?![^"]*?"(?:\s*?,|\s*?$))(?=\s*?(?:,|$))

I almost liked the accepted answer, but it didn't parse the space correctly, and/or it left the double quotes untrimmed, so here is my function:我几乎喜欢接受的答案,但它没有正确解析空间,和/或它没有修剪双引号,所以这是我的功能:

    /**
     * Splits the given string into components, and returns the components array.
     * Each component must be separated by a comma.
     * If the component contains one or more comma(s), it must be wrapped with double quotes.
     * The double quote must not be used inside components (replace it with a special string like __double__quotes__ for instance, then transform it again into double quotes later...).
     *
     * https://stackoverflow.com/questions/11456850/split-a-string-by-commas-but-ignore-commas-within-double-quotes-using-javascript
     */
    function splitComponentsByComma(str){
        var ret = [];
        var arr = str.match(/(".*?"|[^",]+)(?=\s*,|\s*$)/g);
        for (let i in arr) {
            let element = arr[i];
            if ('"' === element[0]) {
                element = element.substr(1, element.length - 2);
            } else {
                element = arr[i].trim();
            }
            ret.push(element);
        }
        return ret;
    }
    console.log(splitComponentsByComma('Hello World, b, c, "d, e, f", c')); // [ 'Hello World', 'b', 'c', 'd, e, f', 'c' ]

I know it's a bit long, but here's my take:我知道它有点长,但这是我的看法:

var sample="[a, b, c, \"d, e, f\", g, h]";

var inQuotes = false, items = [], currentItem = '';

for(var i = 0; i < sample.length; i++) {
  if (sample[i] == '"') { 
    inQuotes = !inQuotes; 

    if (!inQuotes) {
      if (currentItem.length) items.push(currentItem);
      currentItem = '';
    }

    continue; 
  }

  if ((/^[\"\[\]\,\s]$/gi).test(sample[i]) && !inQuotes) {
    if (currentItem.length) items.push(currentItem);
    currentItem = '';
    continue;
  }

  currentItem += sample[i];
}

if (currentItem.length) items.push(currentItem);

console.log(items);

As a side note, it will work both with, and without the braces in the start and end.作为旁注,它可以在开始和结束时使用和不使用大括号。

This takes a csv file one line at a time and spits back an array with commas inside speech marks intact.这一次需要一个 csv 文件,然后返回一个数组,其中包含完整的语音标记内的逗号。 if there are no speech marks detected it just .split(",")s as normal... could probs replace that second loop with something but it does the job as is如果没有检测到语音标记,它只是 .split(",")s 正常...可能会用某些东西替换第二个循环,但它可以按原样完成工作

function parseCSVLine(str){
    if(str.indexOf("\"")>-1){
        var aInputSplit = str.split(",");
        var aOutput = [];
        var iMatch = 0;
        //var adding = 0;
        for(var i=0;i<aInputSplit.length;i++){
            if(aInputSplit[i].indexOf("\"")>-1){
                var sWithCommas = aInputSplit[i];
                for(var z=i;z<aInputSplit.length;z++){
                    if(z !== i && aInputSplit[z].indexOf("\"") === -1){
                        sWithCommas+= ","+aInputSplit[z];
                    }else if(z !== i && aInputSplit[z].indexOf("\"") > -1){
                        sWithCommas+= ","+aInputSplit[z];
                        sWithCommas.replace(new RegExp("\"", 'g'), "");
                        aOutput.push(sWithCommas);
                        i=z;
                        z=aInputSplit.length+1;
                        iMatch++;
                    }
                    if(z === aInputSplit.length-1){
                        if(iMatch === 0){
                            aOutput.push(aInputSplit[z]);
                        }                  
                        iMatch = 0;
                    }
                }
            }else{
                aOutput.push(aInputSplit[i]);
            }
        }
        return aOutput
    }else{
        return str.split(",")
    }
}
["

Parse any CSV or CSV-String code based on TYPESCRIPT<\/i>根据 TYPESCRIPT 解析任何 CSV 或 CSV-String 代码<\/b><\/p>

public parseCSV(content:string):any[string]{
        return content.split("\n").map(ar=>ar.split(/,(?=(?:(?:[^"]*"){2})*[^"]*$)/).map(refi=>refi.replace(/[\x00-\x08\x0E-\x1F\x7F-\uFFFF]/g, "").trim()));
    }

var str='"abc",jkl,1000,qwerty6000';

parseCSV(str);

Use the npm library csv-string to parse the strings instead of split: https://www.npmjs.com/package/csv-string使用 npm 库 csv-string 来解析字符串而不是拆分: https ://www.npmjs.com/package/csv-string

This will handle the empty entries这将处理空条目

Something like a stack should do the trick.像堆栈这样的东西应该可以解决问题。 Here I vaguely use marker boolean as stack (just getting my purpose served with it).在这里,我模糊地使用标记布尔值作为堆栈(只是为了达到我的目的)。

var str = "a,b,c,blah\"d,=,f\"blah,\"g,h,";
var getAttributes = function(str){
  var result = [];
  var strBuf = '';
  var start = 0 ;
  var marker = false;
  for (var i = 0; i< str.length; i++){

    if (str[i] === '"'){
      marker = !marker;
    }
    if (str[i] === ',' && !marker){
      result.push(str.substr(start, i - start));
      start = i+1;
    }
  }
  if (start <= str.length){
    result.push(str.substr(start, i - start));
  }
  return result;
};

console.log(getAttributes(str));

jsfiddle setting image code output image jsfiddle设置图片代码输出图片

The code works if your input string in the format of stringTocompare.如果您的输入字符串采用 stringTocompare 格式,则该代码有效。 Run the code on https://jsfiddle.net/ to see output for fiddlejs setting.https://jsfiddle.net/上运行代码以查看 fiddlejs 设置的输出。 Please refer to the screenshot.请参考截图。 You can either use split function for the same for the code below it and tweak the code according to you need.您可以对其下面的代码使用相同的拆分功能,并根据需要调整代码。 Remove the bold or word with in ** from the code if you dont want to have comma after split attach=attach**+","**+actualString[t+1].如果您不想在 split attach=attach**+","**+actualString[t+1] 之后使用逗号,请从代码中删除粗体或带有 ** 的单词。

var stringTocompare='"Manufacturer","12345","6001","00",,"Calfe,eto,lin","Calfe,edin","4","20","10","07/01/2018","01/01/2006",,,,,,,,"03/31/2004"';

console.log(stringTocompare);

var actualString=stringTocompare.split(',');
console.log("Before");
for(var i=0;i<actualString.length;i++){
console.log(actualString[i]);
}
//var actualString=stringTocompare.split(/,(?=(?:(?:[^"]*"){2})*[^"]*$)/);
for(var i=0;i<actualString.length;i++){
var flag=0;
var x=actualString[i];
if(x!==null)
{
if(x[0]=='"' && x[x.length-1]!=='"'){
   var p=0;
   var t=i;
   var b=i;
   for(var k=i;k<actualString.length;k++){
   var y=actualString[k];
        if(y[y.length-1]!=='"'){        
        p++;
        }
        if(y[y.length-1]=='"'){

                flag=1;
        }
        if(flag==1)
        break;
   }
   var attach=actualString[t];
for(var s=p;s>0;s--){

  attach=attach+","+actualString[t+1];
  t++;
}
actualString[i]=attach;
actualString.splice(b+1,p);
}
}


}
console.log("After");
for(var i=0;i<actualString.length;i++){
console.log(actualString[i]);
}




  [1]: https://i.stack.imgur.com/3FcxM.png

I solved this with a simple parser.我用一个简单的解析器解决了这个问题。

It simply goes through the string char by char, splitting off a segment when it finds the split_char (eg comma), but also has an on/off flag which is switched by finding the encapsulator_char (eg quote).它只是逐个字符地遍历字符串,在找到 split_char(例如逗号)时拆分一个段,但也有一个通过查找 encapsulator_char(例如引号)来切换的开/关标志。 It doesn't require the encapsulator to be at the start of the field/segment (a,b","c,d would produce 3 segments, with 'b","c' as the second), but it should work for a well formed CSV with escaped encapsulator chars.它不需要封装器位于字段/段的开头(a,b","c,d 会产生 3 个段,第二个是 'b","c'),但它应该适用于带有转义封装字符的格式良好的 CSV。

function split_except_within(text, split_char, encapsulator_char, escape_char) {
    var start = 0
    var encapsulated = false
    var fields = []
    for (var c = 0; c < text.length; c++) {
        var char = text[c]
        if (char === split_char && ! encapsulated) {
            fields.push(text.substring(start, c))
            start = c+1
        }
        if (char === encapsulator_char && (c === 0 || text[c-1] !== escape_char) )             
            encapsulated = ! encapsulated
    }
    fields.push(text.substring(start))
    return fields
}

https://jsfiddle.net/7hty8Lvr/1/ https://jsfiddle.net/7hty8Lvr/1/

const csvSplit = (line) => {
    let splitLine = [];

    var quotesplit = line.split('"');
    var lastindex = quotesplit.length - 1;
    // split evens removing outside quotes, push odds
    quotesplit.forEach((val, index) => {
        if (index % 2 === 0) {
            var firstchar = (index == 0) ? 0 : 1;
            var trimmed = (index == lastindex) 
                ? val.substring(firstchar)
                : val.slice(firstchar, -1);
            trimmed.split(",").forEach(v => splitLine.push(v));
        } else {
            splitLine.push(val);
        }
    });
    return splitLine;
}

this works as long as quotes always come on the outside of values that contain the commas that need to be excluded (ie a csv file).只要引号总是出现在包含需要排除的逗号的值的外部(即 csv 文件),这将起作用。

if you have stuff like '1,2,4"2,6",8' it will not work.如果你有像 '1,2,4"2,6",8' 这样的东西,它就行不通了。

I've had similar issues with this, and I've found no good .net solution so went DIY.我也遇到过类似的问题,而且我没有找到好的 .net 解决方案,所以就自己动手做了。 NOTE: This was also used to reply to注意:这也用于回复

Splitting comma separated string, ignore commas in quotes, but allow strings with one double quotation 拆分逗号分隔的字符串,忽略引号中的逗号,但允许字符串带有一个双引号

but seems more applicable here (but useful over there)但在这里似乎更适用(但在那里有用)

In my application I'm parsing a csv so my split credential is ",".在我的应用程序中,我正在解析一个 csv,所以我的拆分凭证是“,”。 this method I suppose only works for where you have a single char split argument.我想这种方法只适用于你有一个 char split 参数的地方。

So, I've written a function that ignores commas within double quotes.因此,我编写了一个忽略双引号内逗号的函数。 it does it by converting the input string into a character array and parsing char by char它通过将输入字符串转换为字符数组并逐字符解析字符来实现

public static string[] Splitter_IgnoreQuotes(string stringToSplit)
    {   
        char[] CharsOfData = stringToSplit.ToCharArray();
        //enter your expected array size here or alloc.
        string[] dataArray = new string[37];
        int arrayIndex = 0;
        bool DoubleQuotesJustSeen = false;          
        foreach (char theChar in CharsOfData)
        {
            //did we just see double quotes, and no command? dont split then. you could make ',' a variable for your split parameters I'm working with a csv.
            if ((theChar != ',' || DoubleQuotesJustSeen) && theChar != '"')
            {
                dataArray[arrayIndex] = dataArray[arrayIndex] + theChar;
            }
            else if (theChar == '"')
            {
                if (DoubleQuotesJustSeen)
                {
                    DoubleQuotesJustSeen = false;
                }
                else
                {
                    DoubleQuotesJustSeen = true;
                }
            }
            else if (theChar == ',' && !DoubleQuotesJustSeen)
            {
                arrayIndex++;
            }
        }
        return dataArray;
    }

This function, to my application taste also ignores ("") in any input as these are unneeded and present in my input.根据我的应用程序的口味,此功能也会忽略任何输入中的 (""),因为这些都是不需要的并且存在于我的输入中。

Assuming your string really looks like '[a, b, c, "d, e, f", g, h]' , I believe this would be 'an acceptable use case for eval() :假设您的字符串确实看起来像'[a, b, c, "d, e, f", g, h]' ,我相信这将是 ' eval()的可接受用例:

myString = 'var myArr ' + myString;
eval(myString);

console.log(myArr); // will now be an array of elements: a, b, c, "d, e, f", g, h

Edit : As Rocket pointed out, strict mode removes eval 's ability to inject variables into the local scope, meaning you'd want to do this:编辑:正如 Rocket 所指出的, strict模式消除了eval将变量注入本地范围的能力,这意味着你想要这样做:

var myArr = eval(myString);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Javascript/RegEx:用逗号分割字符串,但忽略双引号内的逗号 - Javascript/RegEx: Split a string by commas but ignore commas within double-quotes 使用逗号分割字符串,但忽略双引号内的逗号 - javascript - Split the string using comma but ignore the comma within double quotes - javascript JavaScript 忽略从 csv 中提取的 ChartJS 数据字段中的逗号,该数据具有逗号并用双引号引起来 - JavaScript ignore commas inside ChartJS data field pulled from csv that data has commas and is enclosed by double quotes 如何在逗号处分割字符串但忽略\\ ,? - How to split a string at commas but ignore \,? Javascript生成的字符串问题中的双引号 - Double-quotes in a Javascript generated string issue 如何在不加双引号的情况下分割逗号上的文本,同时保留引号? - How can I split text on commas not within double quotes, while keeping the quotes? 将包含逗号和双引号的字符串写入 CSV - Write a string containing commas and double quotes to CSV Javascript:用逗号分隔字符串但忽略引号中的逗号 - Javascript: Splitting a string by comma but ignoring commas in quotes 括号外的逗号分隔的jQuery / JavaScript拆分字符串 - jQuery/JavaScript Split String by Commas that are Outside of Parenthesis 在 JavaScript 中使用正则表达式在字符串上拆分逗号 - Split commas on string with regular expression in JavaScript
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM