Lang: Node JS
I'm using a Texteditor and I get the output string like this
<p>This is <strong>a <a href="#">test</a></strong></p>
but could be different HTML-tags like H1, H2, etc. but nothing more special than actual HTML text tags.
Now I want to turn that string into an object that I can work with and send to my database. So the perfect way would it be transformed into something like this...
[{type: "text", text: "This is ", bold: false}, {type: "text", text: "a ", bold: true}, {type: "link", text: "test", bold: true, href: "#}]
and so on.
I tried the Regex approach and split it by and do all sorts of logic to turn into a structured object but that can't be the best way to do it since it'll fail if I would in the future write <h1>Test</h1>
in the middle of the text as an example.
How would you approach this?
If you want to go easy, jsdom
or htmlparser2
and domhandler
would help doing that. For example, using htmlparser2
and domhandler
(from some of my apps ):
// Parsers helpers
import { Parser } from 'htmlparser2';
import { DomHandler } from 'domhandler';
// Get all text contents, recursively
const getAllText = (node) => {
return node.children.map( n => {
if (n.type === 'text') {
return n.data.trim("\n\r ");
}
// Discard `small` tags
if (n.name === 'small') {
return ''
}
return getAllText(n);
}).join('')
}
// Parses HTML data containing a UL/LI/A tree
const parseMenu = (data) => {
const parseLink = (link) => {
const name = getAllText(link);
const code = link.attribs['data-value']?.trim("\n\r ");
return {
name,
...(code ? {code} : {}),
}
}
const parseLi = (li) => {
const ul = li.children.find(({type, name}) => type === 'tag' && name === 'ul' );
const link = li.children.find(({type, name}) => type === 'tag' && name === 'a' );
return {
...(link ? parseLink(link) : {}),
...(ul ? {children: parseUl(ul)} : {}),
}
}
const parseUl = (ul) => {
return ul.children.filter(({type, name}) => type === 'tag' && name === 'li' ).map( child => {
return parseLi(child);
});
}
let result;
const handler = new DomHandler( (error, dom) => {
if (error) {
// Handle error
} else {
// Parsing completed, do something
result = parseUl(dom[0]);
}
});
const parser = new Parser(handler);
parser.write(data);
parser.end();
return result;
}
Use cheerio library (or any other html parser library of your choise) and operate The "DOM Node" object as you wish.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.