简体   繁体   English

用逗号分隔逗号分隔的字符串

[英]split comma separated string by comma

Need to split the string containing country names separated by comma(,) and also country name itself contains comma(,) too.需要拆分包含由逗号(,)分隔的国家名称的字符串,并且国家名称本身也包含逗号(,)

var str = "South Georgia and The South Sandwich Islands,Congo, Democratic Republic,Mauritania,Finland,Spain,Armenia,Mauritius,France,Sri Lanka,Aruba,Mayotte,French Guiana,Suriname,Australia,Mexico,French Polynesia,Svalbard and Jan Mayen,Austria,Micronesia, Federated States,French Southern Territories";

Expected result:预期结果:

[   "South Georgia and The South Sandwich Islands",
    "Mexico",
    "French Polynesia",
    "Congo, Democratic Republic",
    "Svalbard and Jan Mayen",
    "Micronesia, Federated States",
]

Generally you don't want to use a character that could show up in valid countries as the deliminator, aka ,通常,您不想使用可以在有效国家/地区出现的字符作为分隔符,也就是,
However if we can assume that , will only show up without spaces around it when its used as a deliminator.但是,如果我们可以假设,仅在用作分隔符时才会显示为没有空格。 Then we could use a regex to split the string:然后我们可以使用正则表达式来拆分字符串:

 var str = "South Georgia and The South Sandwich Islands,Congo, Democratic Republic,Mauritania,Finland,Spain,Armenia,Mauritius,France,Sri Lanka,Aruba,Mayotte,French Guiana,Suriname,Australia,Mexico,French Polynesia,Svalbard and Jan Mayen,Austria,Micronesia, Federated States,French Southern Territories"; var res = str.split(/(?<=\\w),(?=\\w)/i); console.log(res)

Regex explained:正则表达式解释:

  • (?<=\\w) is a look behind for any "word" character. (?<=\\w)是对任何“单词”字符的回顾。
  • (?=\\w) is a look ahead for any "word" character. (?=\\w)是对任何“单词”字符的展望。
  • , will match a comma char if and only if the look ahead and look behind succeeds. ,当且仅当向前看和向后看成功时才匹配逗号字符。

Interactive example: https://regexr.com/42b3e互动示例: https : //regexr.com/42b3e

Edit:编辑:

Efter looking into an issue brought up by @BorisSokolov in the comments, its become clear that the regex implementation differs between the major javascript runtimes.在查看@BorisSokolov 在评论中提出的问题后,很明显正则表达式实现在主要 javascript 运行时之间有所不同。

Runtimes tested:运行时测试:

  • V8 (includes Node): Works as intended V8 (包括 Node):按预期工作
  • SpiderMonkey : Throws SyntaxError: invalid regexp group . SpiderMonkey :抛出SyntaxError: invalid regexp group Turns out Mozilla haven't yet implemented the "possitive look behind" standard.事实证明,Mozilla还没有实施“积极的背后”标准。
  • ChakraCore : Throws Script error . ChakraCore :抛出Script error Same here, turns out microsoft haven't yet implemented the "positive look behind" standard either.同样在这里,事实证明微软还没有实施“积极的背后”标准。

Looking at TC39 we can see that the "possitive look behind" is part of the ES2018 spec .查看 TC39 我们可以看到“积极的背后”是 ES2018 规范的一部分 So its expected to be implemented in all major browsers in the near future.因此,它有望在不久的将来在所有主要浏览器中实现。

If the format of the string is in the same format as above then first you need to split string with comma.如果字符串的格式与上述格式相同,则首先需要用逗号分割字符串。 if item in the result array begins with a space then merge that item with previous item in the result array.如果结果数组中的项目以空格开头,则将该项目与结果数组中的前一个项目合并。 In country names with comma, there exists a space after comma while others not带逗号的国名,逗号后有空格,有的则没有

There is probably a way to do it by regular expression, but I would suggest the easy way.可能有一种方法可以通过正则表达式来完成,但我建议使用简单的方法。 Looking at your input, you can see that those commas that separate the title of a country from it's name are followed by a space, whereas the listing comma isn't followed by one: var str = "South Georgia and The South Sandwich Islands,Congo, Democratic Republic,Mauritania,Finland,Spain,Armenia,Mauritius,France,Sri Lanka,Aruba,Mayotte,French Guiana,Suriname,Australia,Mexico,French Polynesia,Svalbard and Jan Mayen,Austria,Micronesia, Federated States,French Southern Territories";查看您的输入,您可以看到将国家名称与其名称分开的那些逗号后跟一个空格,而列表逗号后不跟一个: var str = "South Georgia and The South Sandwich Islands,Congo, Democratic Republic,Mauritania,Finland,Spain,Armenia,Mauritius,France,Sri Lanka,Aruba,Mayotte,French Guiana,Suriname,Australia,Mexico,French Polynesia,Svalbard and Jan Mayen,Austria,Micronesia, Federated States,French Southern Territories";

So in order to separate those two, I would suggest replacing ", " by a special character that would not occur in your input - say "$" .因此,为了将这两者分开,我建议将", "替换为不会出现在您输入中的特殊字符 - 例如"$" Afterwards, you can split by "," .之后,您可以用","分割。 Then you can replace your special character back to ", " :然后你可以将你的特殊字符替换回", "

function getCountryList(str) {
  var strWithSpecialCharacterReplaced = str.replace(", ", "$");
  var countryList = strWithSpecialCharacterReplaced.split(",");
  return countryList.map(countryString => countryString.replace("$", ", "));
}

This is of course not the most performant solution.这当然不是最高效的解决方案。 But it is one.但它是一个。

Just use Regex!只需使用正则表达式!

 var str = "South Georgia and The South Sandwich Islands,Congo, Democratic Republic,Mauritania,Finland,Spain,Armenia,Mauritius,France,Sri Lanka,Aruba,Mayotte,French Guiana,Suriname,Australia,Mexico,French Polynesia,Svalbard and Jan Mayen,Austria,Micronesia, Federated States,French Southern Territories"; var res = str.split(/(?<=\\w),(?=\\w)/i); console.log(res)

In this example, I use (? = [A-zA-Z0-9]) [,] (? = [A-zA-Z0-9]) to get all the commas surrounded by letters and divide them (the " , "is not divided in this case and return is correct)!在这个例子中,我使用 (? = [A-zA-Z0-9]) [,] (? = [A-zA-Z0-9]) 来获取所有由字母包围的逗号并将它们分开(“ , "在这种情况下没有被分割并且返回是正确的)! Tested and working!测试和工作!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM