Overview:
I'm using regex to parse a text document and create a JSON document. The document is parsed from console logs.
What seems to happen is (regex_1_match && regex_2_match)
is not working as expected. It seems to be matching regex_1
and looks to fulfill regex_2
and saving it int the same array.
const fs = require('fs');
const filename = fs.readFileSync('test.txt').toString();
var regex_1 = /"Course([0-9.])"/g;
var regex_2 = /"(Name)"/g;
var regex_3 = /"(No Name)"/g;
var regex_1_match = filename.match(regex_1);
var regex_2_match = filename.match(regex_2);
var regex_3_match = filename.match(regex_3);
let testJSON = [];
//for each line item
for (let index = 0; index < filename.length; index++) {
if(regex_1_match && regex_2_match) {
testJSON.push({
Course: regex_1[index]
Name: regex_2[index]
});
}
}
fs.writeFileSync("parsed_test_doc",JSON.stringify(testJSON));
test.txt:
------------ Course1 ------------
------------ foo ------------
------------ Name ------------
------------ Course2 ------------
------------ foo ------------
------------ No Name ------------
------------ Course3 ------------
------------ Name ------------
------------ foo ------------
------------ Course4 ------------
------------ No Name ------------
------------ Course5 ------------
------------ foo ------------
------------ Name ------------
Output:
[{
"Course": "Course1",
"Name": "Name"
}, {"Course": "Course2",
"Name": "Name"
},{"Course": "Course3",
"Name": "Name"
},{"Course": "Course4",
},{{"Course": "Course5"
}
Expected Output:
[{
"Course": "Course1",
"Name": "Name"
}, {
"Course": "Course2"
}, {
"Course": "Course3",
"Name": "Name"
}, {
"Course": "Course4"
}, {
"Course": "Course5",
"Name": "Name"
}]
A few notes about the example code
regex_1_match && regex_2_match
is true if the result from match (which is either an array or null) is true for both matches, which can give you unexpected resultsfilename
contains the whole file, so index
in this loop will be the number of each character in the file contentregex_1[index]
does not work this way, perhaps you meant to index into the matchregex_3_match
is never usedWhat you might do is use a single pattern with 2 capture groups, where group 2 captures the first occurrence of Name and it is optional.
\b(Course\d+)(?:(?!Course\d)[^])*?(?:No Name|(Name)|(?=Course\d))
The pattern matches:
\b
A word boundary (Course\d+)
Capture in group 1 Course
and 1+ digits (?:
Non capture group
(?!Course\d)[^]
Match any char if Course
and 1+ digits not directly to the right )*?
Close non capture group and optionally repeat it non greedy(?:
Non capture group for the alternatives
No Name|(Name)|(?=Course\d)
Match No Name
, or capture Name
in group 2 or assert the next Course
and a digit to continue the match when there is no Name present. )
Close non capture group const fs = require('fs');
const filename = fs.readFileSync('test.txt').toString();
const regex = /\b(Course\d+)(?:(?!Course\d)[^])*?(?:No Name|(Name)|(?=Course\d))/g;
const result = Array.from(filename.matchAll(regex), m => {
let res = {"Course": m[1]}
if (undefined !== m[2]) {
console.log("not undefined")
res["Name"]=m[2];
}
return res;
});
console.log(result);
Output
[
{ Course: 'Course1', Name: 'Name' },
{ Course: 'Course2' },
{ Course: 'Course3', Name: 'Name' },
{ Course: 'Course4' },
{ Course: 'Course5', Name: 'Name' }
]
const s = `------------ Course1 ------------ ------------ foo ------------ ------------ Name ------------ ------------ Course2 ------------ ------------ foo ------------ ------------ No Name ------------ ------------ Course3 ------------ ------------ Name ------------ ------------ foo ------------ ------------ Course4 ------------ ------------ No Name ------------ ------------ Course5 ------------ ------------ foo ------------ ------------ Name ------------`; const regex = /\b(Course\d+)(?:(??Course\d)[^])*?(:?No Name|(Name)|(;=Course\d))/g. const result = Array.from(s,matchAll(regex): m => { let res = { "Course"; m[1] } if (undefined;== m[2]) { res["Name"] = m[2]; } return res. }); console.log(result);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.