I am working with Selenium for the corresponding website: http://calstate-la.bncollege.com/webapp/wcs/stores/servlet/TBWizardView?catalogId=10001&langId=-1&storeId=30556 .
My goal with this website is to get all the possible combinations of departments, course, and section from their respective drop down menus. The main problem I am having is that I can't think of any way to get the values from the drop down menus.
According to other stack overflow problems similar to mine, they mention solutions working with select tags and option tags. However, when I was looking at this page source there were no such tags for the drop down menu.
So I need help in trying to get all the combinations from the drop down menu but I don't how to proceed in my special case. Also I wanted to mention that I working with Python.
I've actually tried to use selenium
here, but it really becomes painful very fast due to async nature of the page and "artificial" dropdowns ( here is what I had so far ).
Here's an alternative approach using requests
and BeautifulSoup
(no need for browser at all).
The idea is to simulate the udnerlying requests that fill up the dropdowns:
from bs4 import BeautifulSoup
import requests
CATALOG = 10001
STORE = 30556
url = 'http://calstate-la.bncollege.com/webapp/wcs/stores/servlet/TBWizardView?catalogId={catalog}&langId=-1&storeId={store}'.format(catalog=CATALOG,
store=STORE)
xhr_url = 'http://calstate-la.bncollege.com/webapp/wcs/stores/servlet/TextBookProcessDropdownsCmd'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36'}
session = requests.Session()
response = session.get(url, headers=headers)
soup = BeautifulSoup(response.content)
campus = soup.find('input', attrs={'name': 'campus1'}).get('value')
book_row = soup.find('div', class_='bookRowContainer')
params = {
'campusId': campus,
'deptId': '',
'courseId': '',
'sectionId': '',
'storeId': STORE,
'catalogId': CATALOG,
'langId': '-1',
'dropdown': 'term'
}
terms = book_row.select('li.termOption')
for term in terms:
params['termId'] = term.get('data-optionvalue')
response = session.post(xhr_url, params=params, headers=headers)
print response.content
This would print all of the departments for all terms in JSON format.
For Fall 2014:
[
{"categoryName":"AAAS","categoryId":"63420700","categoryIdentifier":"670_1_F14_4","title":"AAAS"},
{"categoryName":"ACCT","categoryId":"63420752","categoryIdentifier":"670_1_F14_5","title":"ACCT"},
...
]
For Summer 2014:
[
{"categoryName":"AAAS","categoryId":"63007512","categoryIdentifier":"670_1_A14_4","title":"AAAS"},
{"categoryName":"ACCT","categoryId":"63007490","categoryIdentifier":"670_1_A14_5","title":"ACCT"},
...
]
Leaving you Course
and Section
part as a homework.
Hope that helps.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.