简体   繁体   English

解析此网站的正确方法是什么?

[英]What is the proper way to approach parsing this website?

The Scenario: 场景:

I have a website from a school that I'm trying to grab information for the schedules that will be available. 我有一个学校的网站,我正在尝试获取可用的时间表信息。 They have a tool that's available for all classes which is a form of PHP that send post information after selecting your courses here: 他们有一个可用于所有类的工具,这是一种PHP形式,在选择您的课程后发送帖子信息:

https://campus.concordia.ca/psc/pscsprd/EMPLOYEE/HRMS/c/CU_EXT.CU_CLASS_SEARCH.GBL https://campus.concordia.ca/psc/pscsprd/EMPLOYEE/HRMS/c/CU_EXT.CU_CLASS_SEARCH.GBL

For a small sample I would choose: 对于一个小样本,我会选择:

  1. Term: Fall 2016 期限: 2016年秋季
  2. Course Career: Undergraduate 课程职业:本科
  3. Select Subject: CIVI 选择主题: CIVI

I'm fairly new to Javascript/jQuery so I'm not sure of what available options I have. 我对Javascript / jQuery很新,所以我不确定我有哪些可用的选项。 The layout of the website is also really hard to navigate... 网站的布局也很难导航......

The things I tried: 我试过的东西:

var elems = document.body.getElementsByTagName("span");
for (var i =0 ; i <elems.length;i++){
console.log(elems[i]);
}

My initial attempt was to identify the elements structure and isolate the text, unfortunately it provides way too many things. 我最初的尝试是识别元素结构并隔离文本,不幸的是它提供了太多的东西。

I noticed however, the naming conventions the elements patterns followed so then I copy pasted the jQuery library into the console and did: 然而,我注意到,元素模式遵循的命名约定,然后我复制粘贴jQuery库到控制台,并做了:

//for the details of the courses
var tempArray =  $('[id^="MTG_"]').map(function() { return this.innerText}).get().join();

//for the name of the courses
$('[id^="SSR_CLSRSLT_WRK_GROUPBOX2"]').map(function() {return this.title}).get().join();

TL;DR The Problem: TL; DR问题:

The name of the courses and all the details aren't linked together by a number order relation. 课程名称和所有细节不是通过数字顺序关系链接在一起的。 Instead The information is divided into massive tables under the ID: 相反,信息被划分为ID下的大规模表格:

$('[id^="ACE_SSR_CLSRSLT_WRK_GROUPBOX2"]') which contains the name of the courses also. $('[id^="ACE_SSR_CLSRSLT_WRK_GROUPBOX2"]')其中也包含课程名称。 Then after a lot of td/tr it has the the details of the course. 然后在很多td / tr之后它就有了课程的细节。

Is there a way to use the selector twice in a row to isolate what is needeed? 有没有办法连续两次使用选择器来隔离需要的东西? For example, something like this: 例如,像这样:

//I know this doesn't work but something like this would be nice

$('[id^="ACE_SSR_CLSRSLT_WRK_GROUPBOX2"]'). $('[id^="MTG_"]').map(function() { return this.innerText}).get().join();

Or is there a better way? 或者,还有更好的方法?

You can use the jquery find() method like this: 你可以像这样使用jquery find()方法:

$('[id^="ACE_SSR_CLSRSLT_WRK_GROUPBOX2"]').find('[id^="MTG_"]').map(function() {
    return this.innerText
}).get().join();

This will find 'MTG_' elements which are children of the 'ACE_SSR_CLSRSLT_WRK_GROUPBOX2' (both direct children and further Down). 这将找到'MTG_'元素,它们是'ACE_SSR_CLSRSLT_WRK_GROUPBOX2'的子元素(直接子节点和进一步向下节拍)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM