简体   繁体   中英

What is the proper way to approach parsing this website?

The Scenario:

I have a website from a school that I'm trying to grab information for the schedules that will be available. They have a tool that's available for all classes which is a form of PHP that send post information after selecting your courses here:

https://campus.concordia.ca/psc/pscsprd/EMPLOYEE/HRMS/c/CU_EXT.CU_CLASS_SEARCH.GBL

For a small sample I would choose:

  1. Term: Fall 2016
  2. Course Career: Undergraduate
  3. Select Subject: CIVI

I'm fairly new to Javascript/jQuery so I'm not sure of what available options I have. The layout of the website is also really hard to navigate...

The things I tried:

var elems = document.body.getElementsByTagName("span");
for (var i =0 ; i <elems.length;i++){
console.log(elems[i]);
}

My initial attempt was to identify the elements structure and isolate the text, unfortunately it provides way too many things.

I noticed however, the naming conventions the elements patterns followed so then I copy pasted the jQuery library into the console and did:

//for the details of the courses
var tempArray =  $('[id^="MTG_"]').map(function() { return this.innerText}).get().join();

//for the name of the courses
$('[id^="SSR_CLSRSLT_WRK_GROUPBOX2"]').map(function() {return this.title}).get().join();

TL;DR The Problem:

The name of the courses and all the details aren't linked together by a number order relation. Instead The information is divided into massive tables under the ID:

$('[id^="ACE_SSR_CLSRSLT_WRK_GROUPBOX2"]') which contains the name of the courses also. Then after a lot of td/tr it has the the details of the course.

Is there a way to use the selector twice in a row to isolate what is needeed? For example, something like this:

//I know this doesn't work but something like this would be nice

$('[id^="ACE_SSR_CLSRSLT_WRK_GROUPBOX2"]'). $('[id^="MTG_"]').map(function() { return this.innerText}).get().join();

Or is there a better way?

You can use the jquery find() method like this:

$('[id^="ACE_SSR_CLSRSLT_WRK_GROUPBOX2"]').find('[id^="MTG_"]').map(function() {
    return this.innerText
}).get().join();

This will find 'MTG_' elements which are children of the 'ACE_SSR_CLSRSLT_WRK_GROUPBOX2' (both direct children and further Down).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM