简体   繁体   中英

Turn HTML string into a organized Object

Lang: Node JS

I'm using a Texteditor and I get the output string like this

<p>This is <strong>a <a href="#">test</a></strong></p>

but could be different HTML-tags like H1, H2, etc. but nothing more special than actual HTML text tags.

Now I want to turn that string into an object that I can work with and send to my database. So the perfect way would it be transformed into something like this...

[{type: "text", text: "This is ", bold: false}, {type: "text", text: "a  ", bold: true}, {type: "link", text: "test", bold: true, href: "#}]

and so on.

I tried the Regex approach and split it by and do all sorts of logic to turn into a structured object but that can't be the best way to do it since it'll fail if I would in the future write <h1>Test</h1> in the middle of the text as an example.

How would you approach this?

If you want to go easy, jsdom or htmlparser2 and domhandler would help doing that. For example, using htmlparser2 and domhandler (from some of my apps ):

// Parsers helpers
import { Parser } from 'htmlparser2';
import { DomHandler } from 'domhandler';

// Get all text contents, recursively
const getAllText = (node) => {
  return node.children.map( n => {
    if (n.type === 'text') {
      return n.data.trim("\n\r ");
    }

    // Discard `small` tags
    if (n.name === 'small') {
      return ''
    }

    return getAllText(n);
  }).join('')
}

// Parses HTML data containing a UL/LI/A tree
const parseMenu = (data) => {

  const parseLink = (link) => {
    const name = getAllText(link);
    const code = link.attribs['data-value']?.trim("\n\r ");
    return {
      name,
      ...(code ? {code} : {}),
    }
  }

  const parseLi = (li) => {
    const ul = li.children.find(({type, name}) => type === 'tag' && name === 'ul' );
    const link = li.children.find(({type, name}) => type === 'tag' && name === 'a' );
    return {
      ...(link ? parseLink(link) : {}),
      ...(ul ? {children:  parseUl(ul)} : {}),
    }
  }

  const parseUl = (ul) => {
    return ul.children.filter(({type, name}) => type === 'tag' && name === 'li' ).map( child => {
      return parseLi(child);
    });
  }

  let result;
  const handler = new DomHandler( (error, dom) => {
    if (error) {
      // Handle error
    } else {
      // Parsing completed, do something
      result = parseUl(dom[0]);
    }
  });

  const parser = new Parser(handler);
  parser.write(data);
  parser.end();
  return result;
}

Use cheerio library (or any other html parser library of your choise) and operate The "DOM Node" object as you wish.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM