简体   繁体   中英

HTML Scraping with Javascript

I use a simple javascript script, in a batch file, to download audio and video - radio and tv shows - from the BBC iPlayer.

Part of the script extracts data from the BBC's xml pages.

I now want to try extracting data from a html page. Can anyone point me to a javascript method for extracting data from an ordinary .htm or .html page?

I'm anxious to keep things simple, by having a javascript routine which I can include in a html page on my website, so I'm only interested in javascript solutions. Thanks.

Edit, 24 Aug -

The BBC's html pages don't respond to the Javascript scripts which successfully parse their xml pages.

I use a simple javascript to interrogate xml, based on this -

function loadXML() { xmlDoc = new ActiveXObject("Microsoft.XMLDOM"); xmlDoc.async = false; xmlDoc.onreadystatechange = readXML; xmlDoc.load(url); }

Your question is kinda vague. I think there may be two ways to get this done: 1. apply RegExp to match patterns 2. import the html into a dom simulator and walk the tree to find the data ( I assume you using nodejs )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM