简体   繁体   中英

Get all combinations from selenium drop down list without options or select tag

I am working with Selenium for the corresponding website: http://calstate-la.bncollege.com/webapp/wcs/stores/servlet/TBWizardView?catalogId=10001&langId=-1&storeId=30556 .

My goal with this website is to get all the possible combinations of departments, course, and section from their respective drop down menus. The main problem I am having is that I can't think of any way to get the values from the drop down menus.

According to other stack overflow problems similar to mine, they mention solutions working with select tags and option tags. However, when I was looking at this page source there were no such tags for the drop down menu.

So I need help in trying to get all the combinations from the drop down menu but I don't how to proceed in my special case. Also I wanted to mention that I working with Python.

I've actually tried to use selenium here, but it really becomes painful very fast due to async nature of the page and "artificial" dropdowns ( here is what I had so far ).

Here's an alternative approach using requests and BeautifulSoup (no need for browser at all).

The idea is to simulate the udnerlying requests that fill up the dropdowns:

from bs4 import BeautifulSoup
import requests

CATALOG = 10001
STORE = 30556

url = 'http://calstate-la.bncollege.com/webapp/wcs/stores/servlet/TBWizardView?catalogId={catalog}&langId=-1&storeId={store}'.format(catalog=CATALOG,
                                                                                                                                     store=STORE)
xhr_url = 'http://calstate-la.bncollege.com/webapp/wcs/stores/servlet/TextBookProcessDropdownsCmd'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36'}

session = requests.Session()
response = session.get(url, headers=headers)
soup = BeautifulSoup(response.content)

campus = soup.find('input', attrs={'name': 'campus1'}).get('value')
book_row = soup.find('div', class_='bookRowContainer')

params = {
    'campusId': campus,
    'deptId': '',
    'courseId': '',
    'sectionId': '',
    'storeId': STORE,
    'catalogId': CATALOG,
    'langId': '-1',
    'dropdown': 'term'
}

terms = book_row.select('li.termOption')
for term in terms:
    params['termId'] = term.get('data-optionvalue')
    response = session.post(xhr_url, params=params, headers=headers)
    print response.content

This would print all of the departments for all terms in JSON format.

For Fall 2014:

[
    {"categoryName":"AAAS","categoryId":"63420700","categoryIdentifier":"670_1_F14_4","title":"AAAS"},
    {"categoryName":"ACCT","categoryId":"63420752","categoryIdentifier":"670_1_F14_5","title":"ACCT"},
    ...
]

For Summer 2014:

[
    {"categoryName":"AAAS","categoryId":"63007512","categoryIdentifier":"670_1_A14_4","title":"AAAS"},
    {"categoryName":"ACCT","categoryId":"63007490","categoryIdentifier":"670_1_A14_5","title":"ACCT"},
    ...
]

Leaving you Course and Section part as a homework.

Hope that helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM