简体   繁体   中英

How to extract multiple strings from between special characters

I have a problem with defining the correct regex.

I need to split the text into groups in such a way that I will have a group/array of digits that are inside braces and a group/array of text that is between those braces.

Example text:

{3} Lorem ipsum "dolor" sit amet, {2} consectetur adipiscing elit. {5}Sed semper; sollicitudin diam, "posuere" {3}aliquet massa pulvinar nec.

And I want to have two arrays:

  1. [3,2,5,3]
  2. ["Lorem ipsum "dolor" sit amet,", "consectetur adipiscing elit.", "Sed semper; sollicitudin diam, "posuere"", "aliquet massa pulvinar nec."]

And almost I made it, but I have a problem with special characters in a text (braces character is prohibited in input text). My present regex:

\{(.)\}+([\d\w\s]+)

And it returns:

  1. ["{3} Lorem ipsum", "{2} consectetur adipiscing elit", "{5}Sed semper", "{3}aliquet massa pulvinar nec"]

I know that later I can split numbers from text using.split('}') substring and so one, on each array element (it won't be nice but it will work).

String.prototype.matchAll() returns an iterator of all matches and their capturing groups which you can then use to populate your seperate arrays.

 const s = `{3} Lorem ipsum "dolor" sit amet, {2} consectetur adipiscing elit. {5}Sed semper; sollicitudin diam, "posuere" {3}aliquet massa pulvinar nec.` const reg = /\{(\d+)\}(.*?)(?=\{|$)/g; const matches = s.matchAll(reg); const braces = [], text = []; for (const match of matches) { const [_, b, t] = match; braces.push(b); text.push(t); } console.log(braces); console.log(text);

Or mapped to an array of a shape of your choice.

 const s = `{3} Lorem ipsum "dolor" sit amet, {2} consectetur adipiscing elit. {5}Sed semper; sollicitudin diam, "posuere" {3}aliquet massa pulvinar nec.` const reg = /\{(\d+)\}(.*?)(?=\{|$)/g; const matches = Array.from(s.matchAll(reg), ([_, digit, text]) => ({digit, text})); console.log(matches);

This would do it:

 var text = `{3} Lorem ipsum "dolor" sit amet, {2} consectetur adipiscing elit. {5}Sed semper; sollicitudin diam, "posuere" {3}aliquet massa pulvinar nec.`; // regex for all digits encased in {} var regex = /\d+(?=\})/g; var nums = text.match(regex); // regex for everything not a digit encased in {} var regex = /[^}]+(?=\{|$)/g; var next_text = text.match(regex); console.log(nums); console.log(next_text);

You can use a similar regex and iterate each match, appending each captured group to the result array, like this:

let str = '{3} Lorem ipsum "dolor" sit amet, {2} consectetur adipiscing elit. {5}Sed semper; sollicitudin diam, "posuere" {3}aliquet massa pulvinar nec.'

let regex = /\{(.)\}([^{]+)/g

let match = regex.exec(str)
let arr1 = []
let arr2 = []
while(match != null){
    arr1.push(match[1])
    arr2.push(match[2])
    match = regex.exec(str)
}
console.log(arr1)
console.log(arr2)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM