简体繁体中英

How to scrape information from a website that doesn't use POST

原文 2018-09-02 13:11:57 6 1 python/ web-scraping/ scrapy/ html-select

I need to get some information from a website that uses an HTML select to filter its content. However, i'm having difficulties doing so, since when changing the value from the select, the website does not 'reload' it uses some internal function to do get the new content.

The webpage in question is this and if I use the Chrome developer tools to see what happens when I change the value of the select. I get a call looking like this.

index.php?eID=dmmjobcontrol&type=discipline&uid=77&_=1535893178522

Interesting is, that the uid is the id of the option of the select, so we are getting the correct id. However, when I go to this link I just get a page saying null .

Taking a similar website into account, this one . When I change the select form there, I get a form data which I can use to get the information I want.

I'm fairly new to scraping and honestly I don't understand how I can get this information. If it's for some use I'm using scrapy in python to parse the information from the websites.

1 answers

One solution is to use client layer which executes both: your scraping "script" and all javascript sent by the website, simulating a real browser. I'm succesfully using PhantomJS for this together with Selenium aka Webdriver API: https://selenium-python.readthedocs.io/getting-started.html

Note that historically Selenium was the first product doing that so the name of this API. In my opinion PhantomJS is better suited, headless by default (doesn't run any GUI process) and faster. Both Selenium and PhantomJS implement a protocol called Webdriver which your Python program would use.

It may sound complicated but please just use Getting Started documentation cited above and check if it suits you.

EDIT: this article also contains simple example of using the described setup: https://realpython.com/headless-selenium-testing-with-python-and-phantomjs/

Note that in many articles people do the similar thing for testing, so the term "scraping" is not even mentioned. but technically it's the same - emulating the user clicking in the browser and at the end getting data from specific page elements.

How scrape a website in which i post information

How can I scrape information from HowLongToBeat.com? It doesn't use a variable in the URL

Python: how to scrape information from a website?

How to scrape movies information from the IMDB website?

How to scrape information from website that requires login

How to scrape EXACT information from a crypto website

How to scrape information from a website and skip to the next point if the information is not existing

How to get scrape information from a textbook buyback website?

Scrape data from a website that URL doesn't change

How do I scrape a table from a website with Python that doesn't have an ID tag or class?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How scrape a website in which i post information How can I scrape information from HowLongToBeat.com? It doesn't use a variable in the URL Python: how to scrape information from a website? How to scrape movies information from the IMDB website? How to scrape information from website that requires login How to scrape EXACT information from a crypto website How to scrape information from a website and skip to the next point if the information is not existing How to get scrape information from a textbook buyback website? Scrape data from a website that URL doesn't change How do I scrape a table from a website with Python that doesn't have an ID tag or class?

Related Tags

How to scrape information from a website that doesn't use POST

Question

1 answers

solution1 1 ACCPTED 2018-09-02 15:48:18

solution1
1 ACCPTED 2018-09-02 15:48:18