[英]Web-Scraping of tables within a timetable
I am fairly new to web-scraping but as part of a project I am working on im trying to scrape details of classes from this timetable https://www101.dcu.ie/timetables/feed.php?prog=case&per=2&week1=19&week2=30&day=7&hour=1-20&template=Studprog I'm going to try use jsoup but amen't sure how exactly to parse the data in the way that would return only the relevant information. 我对抓取网页游戏还很陌生,但是作为一个项目的一部分,我正在尝试从此时间表中抓取类的详细信息https://www101.dcu.ie/timetables/feed.php?prog=case&per=2&week1= 19&week2 = 30&day = 7&hour = 1-20&template = Studprog我将尝试使用jsoup,但不知道如何准确地解析数据,从而仅返回相关信息。 Any help or insight would be greatly appreciated
任何帮助或见识将不胜感激
You can use iconv
and cheerio
. 您可以使用
iconv
和cheerio
。
I made a functional example for you to see: 我做了一个功能性的例子供您查看:
const rp = require('request-promise');
const iconv = require('iconv-lite');
const cheerio = require('cheerio');
const getRequestDefault = (method) => (url) =>
rp({
encoding: null,
method: method,
uri: url,
rejectUnauthorized: false
})
.then(html => {
const $ = cheerio.load(
iconv.decode(
new Buffer(html), "ISO-8859-1"
)
);
return $;
})
const getRows = () =>
getRequestDefault('GET')(`https://www101.dcu.ie/timetables/feed.php?prog=case&per=2&week1=19&week2=30&day=7&hour=1-20&template=Studprog`)
.then($ => {
// Example
$('table tbody tr')
.toArray()
.forEach(
a => {
console.log($(a).text());
}
);
});
getRows();
This is going to scrap all the fields tr
of all the tables. 这将废弃所有表的所有字段
tr
。
You can use this as a starting point. 您可以以此为起点。 Just copy the code into a .js file, install the dependencies and use:
node file.js
只需将代码复制到.js文件中,安装依赖项并使用:
node file.js
To install the dependencies: npm install cheerio iconv request request-promise
安装依赖项:
npm install cheerio iconv request request-promise
I hope it helps you 希望对您有帮助
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.