简体   繁体   中英

Scrapy response is a different language from request and resposne url

I'm trying to scrape search results from this page

http://eur-lex.europa.eu/search.html?qid=1437402891621&DB_TYPE_OF_ACT=advGeneral&CASE_LAW_SUMMARY=false&DTS_DOM=EU_LAW&typeOfActStatus=ADV_GENERAL&type=advanced&lang=fr&SUBDOM_INIT=EU_CASE_LAW&DTS_SUBDOM=EU_CASE_LAW

The language according to the url is french, and that is what I see in the scrapy shell, following 'crawled (200) '

If I try response.url I also get a url with lang=fr.

Viewing the page in a browser shows me french results.

However, the body of the response is English.

I've tried disabling cookies in my scrapy settings.py file. I've also set the DEFAULT_REQUEST HEADERS to 'Accept-Language': 'fr'.

Any ideas?

In the upper right corner of the webpage there's a drop down field to choose the language of the website. Selecting french there will add another parameter to the url: &locale=fr .

So - add that parameter to your start_url .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM