简体   繁体   中英

How can I determine if a string contains XML in node.js?

Given an arbitrary string, how can I determine if it contains XML, and parse it out in a node.js app?

Ex.

var s = 'hello world <hello type="greeting">world</hello>';

I've tried nodexml and xml2js , but both of them require the entire string to be XML.

Edit for clarity:

Ideally I'd like something like:

var s = 'hello world <hello type="greeting">world</hello>';
var parsed = parse( s );
console.log( parsed );
{
  originalString: 'hello world <hello type="greeting">world</hello>',
  textOnly: 'hello world ',
  js: {
    hello: {
      type: 'greeting'
      '@text': 'world'
    }
  }
}

You could try loading your string using node-htmlparser

npm install htmlparser

Since its parser is forgiving with malformed and partial HTML strings you should be able to load any input and then check for a specific HTML tag in order to determine whether the parsed data returned a DOM.

My recommendation is to use htmlparser2 . Demo

npm install htmlparser2

A forgiving HTML/XML/RSS parser. The parser can handle streams and provides a callback interface. This is a fork of the htmlparser module. The main difference is that this is intended to be used only with node (it runs on other platforms using browserify).

Tested with the following data:

 var input = "Hello This is Bikram"+
    "<hello type="greeting">world</hello>"+
  "<head>"+
    "<meta charset="utf8"/>"+
    "<title>Page Title</title>"+
  "</head>"+
  "<body>"+
    "<a href="https://github.com/ForbesLindesay">"+
      "<img src="/static/forkme.png" alt="Fork me on GitHub">"+
    "</a>"+
"</body>"+
    "Sample answer for stackoverflow!!!"

Output: Refer the demo link for output

Performance Measurement:

gumbo-parser   : 34.9208 ms/file ± 21.4238
html-parser    : 24.8224 ms/file ± 15.8703
html5          : 419.597 ms/file ± 264.265
htmlparser     : 60.0722 ms/file ± 384.844
htmlparser2-dom: 12.0749 ms/file ± 6.49474
htmlparser2    : 7.49130 ms/file ± 5.74368
hubbub         : 30.4980 ms/file ± 16.4682
libxmljs       : 14.1338 ms/file ± 18.6541
parse5         : 22.0439 ms/file ± 15.3743
sax            : 49.6513 ms/file ± 26.6032

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM