简体   繁体   中英

Not pushing correct regex match to JSON using javascript

Overview:

I'm using regex to parse a text document and create a JSON document. The document is parsed from console logs.

What seems to happen is (regex_1_match && regex_2_match) is not working as expected. It seems to be matching regex_1 and looks to fulfill regex_2 and saving it int the same array.

const fs = require('fs');
const filename = fs.readFileSync('test.txt').toString();
var regex_1 = /"Course([0-9.])"/g;
var regex_2 = /"(Name)"/g;
var regex_3 = /"(No Name)"/g;
var regex_1_match = filename.match(regex_1);
var regex_2_match = filename.match(regex_2);
var regex_3_match = filename.match(regex_3);

let testJSON = [];

//for each line item
for  (let index = 0; index < filename.length; index++) {
  if(regex_1_match && regex_2_match) {
    testJSON.push({
      Course: regex_1[index]
      Name: regex_2[index]
    });
  }
}
fs.writeFileSync("parsed_test_doc",JSON.stringify(testJSON));

test.txt:

------------ Course1 ------------
------------ foo ------------
------------ Name ------------
------------ Course2 ------------
------------ foo ------------
------------ No Name ------------
------------ Course3 ------------
------------ Name ------------
------------ foo ------------
------------ Course4 ------------
------------ No Name ------------
------------ Course5 ------------
------------ foo ------------
------------ Name ------------

Output:

[{
  "Course": "Course1",
  "Name": "Name"
}, {"Course": "Course2",
  "Name": "Name"
},{"Course": "Course3",
  "Name": "Name"
},{"Course": "Course4",
},{{"Course": "Course5"
}

Expected Output:

[{
  "Course": "Course1",
  "Name": "Name"
}, {
  "Course": "Course2"
}, {
  "Course": "Course3",
  "Name": "Name"
}, {
  "Course": "Course4"
}, {
  "Course": "Course5",
  "Name": "Name"
}]

A few notes about the example code

  • In the current code and the example text, this part regex_1_match && regex_2_match is true if the result from match (which is either an array or null) is true for both matches, which can give you unexpected results
  • Note that the variable filename contains the whole file, so index in this loop will be the number of each character in the file content
  • Using index to index into the regex regex_1[index] does not work this way, perhaps you meant to index into the match
  • regex_3_match is never used

What you might do is use a single pattern with 2 capture groups, where group 2 captures the first occurrence of Name and it is optional.

\b(Course\d+)(?:(?!Course\d)[^])*?(?:No Name|(Name)|(?=Course\d))

The pattern matches:

  • \b A word boundary
  • (Course\d+) Capture in group 1 Course and 1+ digits
  • (?: Non capture group
    • (?!Course\d)[^] Match any char if Course and 1+ digits not directly to the right
  • )*? Close non capture group and optionally repeat it non greedy
  • (?: Non capture group for the alternatives
    • No Name|(Name)|(?=Course\d) Match No Name , or capture Name in group 2 or assert the next Course and a digit to continue the match when there is no Name present.
  • ) Close non capture group

Regex demo

const fs = require('fs');
const filename = fs.readFileSync('test.txt').toString();
const regex = /\b(Course\d+)(?:(?!Course\d)[^])*?(?:No Name|(Name)|(?=Course\d))/g;
const result = Array.from(filename.matchAll(regex), m => {
    let res = {"Course": m[1]}
    if (undefined !== m[2]) {
        console.log("not undefined")
        res["Name"]=m[2];
    }
    return res;
});
console.log(result);

Output

[
  { Course: 'Course1', Name: 'Name' },
  { Course: 'Course2' },
  { Course: 'Course3', Name: 'Name' },
  { Course: 'Course4' },
  { Course: 'Course5', Name: 'Name' }
]

 const s = `------------ Course1 ------------ ------------ foo ------------ ------------ Name ------------ ------------ Course2 ------------ ------------ foo ------------ ------------ No Name ------------ ------------ Course3 ------------ ------------ Name ------------ ------------ foo ------------ ------------ Course4 ------------ ------------ No Name ------------ ------------ Course5 ------------ ------------ foo ------------ ------------ Name ------------`; const regex = /\b(Course\d+)(?:(??Course\d)[^])*?(:?No Name|(Name)|(;=Course\d))/g. const result = Array.from(s,matchAll(regex): m => { let res = { "Course"; m[1] } if (undefined;== m[2]) { res["Name"] = m[2]; } return res. }); console.log(result);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM