Why Scrapy Udemy gives response 403 error?

Question

I was trying to use scrapy shell to see response.css result of the page basically. the simple code snippet which i was using is response.css("title::text").extract(). Normally this should give you the title of the webpage. But i understand that it is not possible for Udemy. On the other hand i used it for amazon and it is working fine. Any comments?

scrapy shell "https://www.udemy.com/courses/search/?q=python&src=sac&kw=python"
response.css("title::text").extract()
['Access to this page has been denied.']

on the other hand this below one is working fine.

scrapy shell "https://www.amazon.com/s?k=garlic+press&crid=2DY5U90PELGKN&sprefix=garlic+pres%2Caps%2C286&ref=nb_sb_ss_i_1_11"

response.css("title::text").extract()
['Amazon.com: garlic press']

EDIT:

scrapy shell --set=USER_AGENT='Mozilla/5.0' "https://www.udemy.com/courses/search/?q=python&src=sac&kw=python"
response.css("h4::text").extract()
[]

Answer 1

Udemy is trying to prevent you from using automation scraping. It returns an HTTP 403 response, and in that response's body there's some text stating:

Access to this page has been denied because we believe you are using automation tools to browse the website.

They're blocking when the value of the HTTP header User-Agent is not something that they want to access their content. Luckily, headers can be spoofed.

scrapy shell --set=USER_AGENT='Mozilla/5.0' "https://www.udemy.com/courses/search/?q=python&src=sac&kw=python"

Ought to work (though, I don't have python/scrapy on this machine, so I didn't test)

edit: I'm not certain about the legalities of circumventing their bot protection... Make sure to check your local laws before you use this advice.

Why Scrapy Udemy gives response 403 error?

Question

1 answers

solution1
1 2020-01-17 18:47:59

Why Scrapy Udemy gives response 403 error?

Question

1 answers

solution1 1 2020-01-17 18:47:59

solution1
1 2020-01-17 18:47:59