简体   繁体   English

如何确定字符串是否包含node.js中的XML?

[英]How can I determine if a string contains XML in node.js?

Given an arbitrary string, how can I determine if it contains XML, and parse it out in a node.js app? 给定一个任意字符串,我如何确定它是否包含XML,并在node.js应用程序中解析它?

Ex. 防爆。

var s = 'hello world <hello type="greeting">world</hello>';

I've tried nodexml and xml2js , but both of them require the entire string to be XML. 我尝试过nodexmlxml2js ,但是它们都要求整个字符串都是XML。

Edit for clarity: 为清晰起见编辑:

Ideally I'd like something like: 理想情况下,我喜欢这样的东西:

var s = 'hello world <hello type="greeting">world</hello>';
var parsed = parse( s );
console.log( parsed );
{
  originalString: 'hello world <hello type="greeting">world</hello>',
  textOnly: 'hello world ',
  js: {
    hello: {
      type: 'greeting'
      '@text': 'world'
    }
  }
}

You could try loading your string using node-htmlparser 您可以尝试使用node-htmlparser加载字符串

npm install htmlparser

Since its parser is forgiving with malformed and partial HTML strings you should be able to load any input and then check for a specific HTML tag in order to determine whether the parsed data returned a DOM. 由于它的解析器对格式错误的部分HTML字符串很宽容,因此您应该能够加载任何输入,然后检查特定的HTML标记,以确定解析的数据是否返回了DOM。

My recommendation is to use htmlparser2 . 我的建议是使用htmlparser2 Demo 演示

npm install htmlparser2

A forgiving HTML/XML/RSS parser. 宽松的HTML / XML / RSS解析器。 The parser can handle streams and provides a callback interface. 解析器可以处理流并提供回调接口。 This is a fork of the htmlparser module. 这是htmlparser模块的一个分支。 The main difference is that this is intended to be used only with node (it runs on other platforms using browserify). 主要区别在于它仅用于节点(它使用browserify在其他平台上运行)。

Tested with the following data: 测试了以下数据:

 var input = "Hello This is Bikram"+
    "<hello type="greeting">world</hello>"+
  "<head>"+
    "<meta charset="utf8"/>"+
    "<title>Page Title</title>"+
  "</head>"+
  "<body>"+
    "<a href="https://github.com/ForbesLindesay">"+
      "<img src="/static/forkme.png" alt="Fork me on GitHub">"+
    "</a>"+
"</body>"+
    "Sample answer for stackoverflow!!!"

Output: Refer the demo link for output 输出:请参阅演示链接以获取输出

Performance Measurement: 绩效评估:

gumbo-parser   : 34.9208 ms/file ± 21.4238
html-parser    : 24.8224 ms/file ± 15.8703
html5          : 419.597 ms/file ± 264.265
htmlparser     : 60.0722 ms/file ± 384.844
htmlparser2-dom: 12.0749 ms/file ± 6.49474
htmlparser2    : 7.49130 ms/file ± 5.74368
hubbub         : 30.4980 ms/file ± 16.4682
libxmljs       : 14.1338 ms/file ± 18.6541
parse5         : 22.0439 ms/file ± 15.3743
sax            : 49.6513 ms/file ± 26.6032

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM