[英]What is the proper way to approach parsing this website?
I have a website from a school that I'm trying to grab information for the schedules that will be available. 我有一个学校的网站,我正在尝试获取可用的时间表信息。 They have a tool that's available for all classes which is a form of PHP that send post information after selecting your courses here:
他们有一个可用于所有类的工具,这是一种PHP形式,在选择您的课程后发送帖子信息:
https://campus.concordia.ca/psc/pscsprd/EMPLOYEE/HRMS/c/CU_EXT.CU_CLASS_SEARCH.GBL https://campus.concordia.ca/psc/pscsprd/EMPLOYEE/HRMS/c/CU_EXT.CU_CLASS_SEARCH.GBL
For a small sample I would choose: 对于一个小样本,我会选择:
I'm fairly new to Javascript/jQuery so I'm not sure of what available options I have. 我对Javascript / jQuery很新,所以我不确定我有哪些可用的选项。 The layout of the website is also really hard to navigate...
网站的布局也很难导航......
var elems = document.body.getElementsByTagName("span");
for (var i =0 ; i <elems.length;i++){
console.log(elems[i]);
}
My initial attempt was to identify the elements structure and isolate the text, unfortunately it provides way too many things. 我最初的尝试是识别元素结构并隔离文本,不幸的是它提供了太多的东西。
I noticed however, the naming conventions the elements patterns followed so then I copy pasted the jQuery library into the console and did: 然而,我注意到,元素模式遵循的命名约定,然后我复制粘贴jQuery库到控制台,并做了:
//for the details of the courses
var tempArray = $('[id^="MTG_"]').map(function() { return this.innerText}).get().join();
//for the name of the courses
$('[id^="SSR_CLSRSLT_WRK_GROUPBOX2"]').map(function() {return this.title}).get().join();
The name of the courses and all the details aren't linked together by a number order relation. 课程名称和所有细节不是通过数字顺序关系链接在一起的。 Instead The information is divided into massive tables under the ID:
相反,信息被划分为ID下的大规模表格:
$('[id^="ACE_SSR_CLSRSLT_WRK_GROUPBOX2"]')
which contains the name of the courses also. $('[id^="ACE_SSR_CLSRSLT_WRK_GROUPBOX2"]')
其中也包含课程名称。 Then after a lot of td/tr it has the the details of the course. 然后在很多td / tr之后它就有了课程的细节。
Is there a way to use the selector twice in a row to isolate what is needeed? 有没有办法连续两次使用选择器来隔离需要的东西? For example, something like this:
例如,像这样:
//I know this doesn't work but something like this would be nice
$('[id^="ACE_SSR_CLSRSLT_WRK_GROUPBOX2"]'). $('[id^="MTG_"]').map(function() { return this.innerText}).get().join();
Or is there a better way? 或者,还有更好的方法?
You can use the jquery find()
method like this: 你可以像这样使用jquery
find()
方法:
$('[id^="ACE_SSR_CLSRSLT_WRK_GROUPBOX2"]').find('[id^="MTG_"]').map(function() {
return this.innerText
}).get().join();
This will find 'MTG_' elements which are children of the 'ACE_SSR_CLSRSLT_WRK_GROUPBOX2' (both direct children and further Down). 这将找到'MTG_'元素,它们是'ACE_SSR_CLSRSLT_WRK_GROUPBOX2'的子元素(直接子节点和进一步向下节拍)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.