简体   繁体   中英

Web-Scraping of tables within a timetable

I am fairly new to web-scraping but as part of a project I am working on im trying to scrape details of classes from this timetable https://www101.dcu.ie/timetables/feed.php?prog=case&per=2&week1=19&week2=30&day=7&hour=1-20&template=Studprog I'm going to try use jsoup but amen't sure how exactly to parse the data in the way that would return only the relevant information. Any help or insight would be greatly appreciated

You can use iconv and cheerio .

I made a functional example for you to see:

const rp = require('request-promise');
const iconv = require('iconv-lite');
const cheerio = require('cheerio');

const getRequestDefault = (method) => (url) => 
    rp({
        encoding: null,
        method: method,
        uri: url,
        rejectUnauthorized: false
    })
        .then(html => {
            const $ = cheerio.load(
                iconv.decode(
                    new Buffer(html), "ISO-8859-1"
                )
            );

            return $;
        })

const getRows = () => 
    getRequestDefault('GET')(`https://www101.dcu.ie/timetables/feed.php?prog=case&per=2&week1=19&week2=30&day=7&hour=1-20&template=Studprog`)
        .then($ => {
            // Example
            $('table tbody tr')
                .toArray()
                .forEach(
                    a => {
                        console.log($(a).text());
                    }
                );
        });

getRows();

This is going to scrap all the fields tr of all the tables.

You can use this as a starting point. Just copy the code into a .js file, install the dependencies and use: node file.js

To install the dependencies: npm install cheerio iconv request request-promise

I hope it helps you

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM