I have a file of the following structure. It's not XML, but I need to somehow make a JSON out of it.
So while I would expect the file to look like this:
<chapter>
<line> Some text which I want to grab. </line>
<line> Some more text which I want to grab. </line>
<line> Even more text which I want to grab. </line>
</chapter>
It is in fact structured like this:
<chapter>
<line /> Some text which I want to grab.
<line /> Some more text which I want to grab.
<line /> Even more text which I want to grab.
</chapter>
So the 'lines' of each chapter just stand next to the self-closing line tags. Can you recommend a method of grabbing these? Possibly in javascript / nodejs?
The format is valid XML, so you can use the regular XML techniques ... ie DOMParser
, to parse the content
However, you just need to be a bit clever about parsing the lines - you want to find each line, and gather up all the sibling nodes that are text nodes (should be only one, but the code I present doesn't make any assumptions)
You didn't specify the output "structure", but here's one method you could use which outputs a nested array - first level is chapters, in each chapter there's an array of lines
var xml = `<chapter>
<line /> Some text which I want to grab.
<line /> Some more text which I want to grab.
<line /> Even more text which I want to grab.
</chapter>`
var parser = new DOMParser();
var content = parser.parseFromString(xml, 'application/xml')
var chapters = content.getElementsByTagName('chapter');
var obj = [].reduce.call(chapters, function(result, chapter) {
var lines = chapter.getElementsByTagName('line');
result.push([].reduce.call(lines, function(result, line) {
var text = '';
for(var node = line.nextSibling; node && node.nodeType == 3; node = node.nextSibling) {
text += node.nodeValue;
}
result.push(text);
return result;
}, []))
return result;
}, []);
console.log(JSON.stringify(obj));
addressing the comments - firstly some documentation:
Now, to explain [].reduce.call(array, fn)
in this code
[].reduce.call
is shorthand for Array.prototype.reduce.call
getElementsByTagName
returns a HTMLCollection
... which behaves like an array, except it isn't one ... there are several ways to make an array out of a HTMLCollection - the most primitive:
var array = [];
for(var i = 0; i < collection.length; i++) {
array[i] = collection[i];
}
or
var array = Array.prototype.slice.call(collection);
or (ES2015+) - not available in IE unless you polyfill - see documentation
var array = Array.from(collection);
However, using the .call
method on [].reduce
allows the first argument (the this
argument) to be any iterable, not just an array, and so it's just like using array
from above like array.reduce(fn)
- it's a way to treat the HTMLcollection like an array, without the need for an intermediate variable
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.