简体   繁体   English

使用 python 抓取 Indeed web 时遇到 403 错误

[英]Facing 403 error while Indeed web scraping using python

I need to do the web scraping using request in 'https://in.indeed.com/'.我需要使用“https://in.indeed.com/”中的请求进行 web 抓取。 When I'm running the code it shows the 403 error当我运行代码时,它显示 403 错误

Can anyone tell me the solution..谁能告诉我解决方法。。

url = "https://in.indeed.com"

hdr = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'}

result = requests.get(url,headers = hdr)

print(result)

I have tried till this to check the status code of the website only it shows error到目前为止,我一直在尝试检查网站的状态代码,但它只显示错误

Note: Need to do the web scraping without using selenium注意:需要在不使用 selenium 的情况下进行 web 抓取

It appears that there are some headers missing in your request.您的请求中似乎缺少一些标头。 I also get a 403 when i do the request like that.当我那样做请求时,我也会得到 403。 However, a copied cURL request works:但是,复制的 cURL 请求有效:

Try the following:尝试以下操作:

  1. Open the website in Chrome (or Firefox)在 Chrome(或 Firefox)中打开网站
  2. Open the developers console打开开发者控制台
  3. Copy the request from the first GET (4. You can try out the cURL request in your console. It worked for me without any 403)从第一个 GET 复制请求(4。您可以在控制台中尝试 cURL 请求。它对我有用,没有任何 403)
  4. Set the headers similar to the cURL request in your code在您的代码中设置类似于 cURL 请求的标头

在此处输入图像描述

(However, i'm assuming the take some measures against web scraping. So you may run in further problems. I'm guessing that you also have to save the cookie or something like this.) (但是,我假设对 web 抓取采取了一些措施。所以你可能会遇到更多问题。我猜你还必须保存 cookie 或类似的东西。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM