[英]Parsing xml file from url and loop to get all urls in it using node.js
I'm using node module xml2js
. 我正在使用节点模块
xml2js
。 My xml
file is of the form.: 我的
xml
文件具有以下形式:
<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl"?>
<?xml-stylesheet type="text/css" media="screen" href="some url" ?>
<rss xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" version="2.0">
<channel>
<item>
<pubDate>Fri, 19 Sep 2014 18:00:08 GMT</pubDate>
<guid isPermaLink="false">http://www.example0.com</guid>
</item>
<item>
<pubDate>Fri, 19 Sep 2014 17:52:25 GMT</pubDate>
<guid isPermaLink="false">http://www.example1.com</guid>
</item>
</channel>
</rss>
I want to get all the urls under <item><guid isPermaLink="false">
as an array. 我想将
<item><guid isPermaLink="false">
下的所有URL作为数组获取。
I'm trying out the code, but it is for a locally stored xml file. 我正在尝试代码,但是它用于本地存储的xml文件。 Also, I'm unable to get the urls.:
另外,我无法获取网址。:
var fs = require('fs'),
xml2js = require('xml2js');
var parser = new xml2js.Parser();
parser.addListener('end', function(result) {
console.dir(result);
console.log('Done.');
});
fs.readFile(__dirname + '/foo.xml', function(err, data) {
parser.parseString(data);
});
You can use the sax-js module to extract URLs you need. 您可以使用sax-js模块提取所需的URL。 The module you mentioned uses
sax-js
internally. 您提到的模块在内部使用
sax-js
。
Here is the code (rough cuts): 这是代码(粗略):
'use strict';
var sax = require('sax');
var fs = require('fs');
var filePath = __dirname + '/' + 'foo.xml';
var isTextPending = false;
var saxStream = sax.createStream(true);
saxStream.on('error', function (e) {
console.error(e);
});
saxStream.ontext = function (text) {
if(isTextPending) {
console.log(text);
isTextPending = false;
}
};
saxStream.on('opentag', function (node) {
if(node.name === 'guid' && node.attributes.isPermaLink === 'false') {
isTextPending = true;
}
});
fs.createReadStream(filePath)
.pipe(saxStream);
And the output is: 输出为:
http://www.example0.com
http://www.example1.com
UPD: UPD:
To fetch XML from the internet to process it, use the request module: 要从互联网上获取XML进行处理,请使用请求模块:
var request = require('request');
var href = 'http://SOME_URL.xml';
request(href)
.pipe(saxStream);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.