简体   繁体   English

如何从特殊字符之间提取多个字符串

[英]How to extract multiple strings from between special characters

I have a problem with defining the correct regex.我在定义正确的正则表达式时遇到问题。

I need to split the text into groups in such a way that I will have a group/array of digits that are inside braces and a group/array of text that is between those braces.我需要将文本分成组,这样我将有一组位于大括号内的数字组/一组位于这些大括号之间的文本组/数组。

Example text:示例文本:

{3} Lorem ipsum "dolor" sit amet, {2} consectetur adipiscing elit. {3} Lorem ipsum "dolor" sat amet,{2} consectetur adipiscing elit。 {5}Sed semper; {5}Sed semper; sollicitudin diam, "posuere" {3}aliquet massa pulvinar nec. sollicitudin diam,“posuere”{3}aliquet massa pulvinar nec。

And I want to have two arrays:我想要两个 arrays:

  1. [3,2,5,3] [3,2,5,3]
  2. ["Lorem ipsum "dolor" sit amet,", "consectetur adipiscing elit.", "Sed semper; sollicitudin diam, "posuere"", "aliquet massa pulvinar nec."] ["Lorem ipsum "dolor" sit amet", "consectetur adipiscing elit.", "Sed semper; sollicitudin diam, "posuere"", "aliquet massa pulvinar nec."]

And almost I made it, but I have a problem with special characters in a text (braces character is prohibited in input text).几乎我做到了,但我对文本中的特殊字符有疑问(输入文本中禁止使用大括号字符)。 My present regex:我现在的正则表达式:

\{(.)\}+([\d\w\s]+)

And it returns:它返回:

  1. ["{3} Lorem ipsum", "{2} consectetur adipiscing elit", "{5}Sed semper", "{3}aliquet massa pulvinar nec"] [“{3} Lorem ipsum”、“{2} consectetur adipiscing elit”、“{5}Sed semper”、“{3}aliquet massa pulvinar nec”]

I know that later I can split numbers from text using.split('}') substring and so one, on each array element (it won't be nice but it will work).我知道以后我可以在每个数组元素上使用.split('}') substring 等从文本中拆分数字(它不会很好,但它会起作用)。

String.prototype.matchAll() returns an iterator of all matches and their capturing groups which you can then use to populate your seperate arrays. String.prototype.matchAll()返回所有匹配项及其捕获组的迭代器,然后您可以使用它来填充单独的 arrays。

 const s = `{3} Lorem ipsum "dolor" sit amet, {2} consectetur adipiscing elit. {5}Sed semper; sollicitudin diam, "posuere" {3}aliquet massa pulvinar nec.` const reg = /\{(\d+)\}(.*?)(?=\{|$)/g; const matches = s.matchAll(reg); const braces = [], text = []; for (const match of matches) { const [_, b, t] = match; braces.push(b); text.push(t); } console.log(braces); console.log(text);

Or mapped to an array of a shape of your choice.或映射到您选择的形状的数组。

 const s = `{3} Lorem ipsum "dolor" sit amet, {2} consectetur adipiscing elit. {5}Sed semper; sollicitudin diam, "posuere" {3}aliquet massa pulvinar nec.` const reg = /\{(\d+)\}(.*?)(?=\{|$)/g; const matches = Array.from(s.matchAll(reg), ([_, digit, text]) => ({digit, text})); console.log(matches);

This would do it:这会做到:

 var text = `{3} Lorem ipsum "dolor" sit amet, {2} consectetur adipiscing elit. {5}Sed semper; sollicitudin diam, "posuere" {3}aliquet massa pulvinar nec.`; // regex for all digits encased in {} var regex = /\d+(?=\})/g; var nums = text.match(regex); // regex for everything not a digit encased in {} var regex = /[^}]+(?=\{|$)/g; var next_text = text.match(regex); console.log(nums); console.log(next_text);

You can use a similar regex and iterate each match, appending each captured group to the result array, like this:您可以使用类似的正则表达式并迭代每个匹配项,将每个捕获的组附加到结果数组,如下所示:

let str = '{3} Lorem ipsum "dolor" sit amet, {2} consectetur adipiscing elit. {5}Sed semper; sollicitudin diam, "posuere" {3}aliquet massa pulvinar nec.'

let regex = /\{(.)\}([^{]+)/g

let match = regex.exec(str)
let arr1 = []
let arr2 = []
while(match != null){
    arr1.push(match[1])
    arr2.push(match[2])
    match = regex.exec(str)
}
console.log(arr1)
console.log(arr2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM