i'm trying to scrape the website that contains loading screens. when i browse the website it shows loading.. for a sec and then it loads up. But the problem is when i try to scrape it using scrapy
it gives me nothing (probably because of that loading). can i solve the problem using scrapy
or should i use some other tools? here's the link to the website if you wanna see https://www.graana.com/project/601/lotus-lake-towers
As it is sending a GET request to get information about the property, you should mimic the same in your code. (You can observe the GET call under console -> Network -> XHR )
# -*- coding: utf-8 -*-
import scrapy
class GranaSpider(scrapy.Spider):
name = 'grana'
allowed_domains = 'www.graana.com'
start_urls = ['https://www.graana.com/api/area/slug/601']
def parse(self, response):
# for url in allurlList:
scrapy.http.Request(response.url, method='GET' , dont_filter=False)
print(response.body)
#convert json response to array and save to your storage system
Output is in json format, convert it to your convenience.
I know this question is old and already answered but I wanted to share my solution after encountering a similar problem. The accepted answer was not helpful to me because I was not using scrapy.
Scrape websites that first display a loading page and then displays the actual content.
Here's an example of such a website :
The requests library will not work for such websites. In my experience, request.get(URL, headers=HEADERS)
simply times out.
Use Selenium .
driver.get(URL)
.driver.page_source
.from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
# the following options are only for setup purposes
chrome_options = Options()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome(options=chrome_options)
URL = "https://www.myjob.mu/ShowResults.aspx?Keywords=&Location=&Category=39&Recruiter=Company&SortBy=MostRecent"
driver.get(URL)
time.sleep(5) # any number > 3 should work fine
html = driver.page_source
print(html)
Beautifulsoup
library can then be used for parsing the html.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.