[英]Why by scraping LinkedIn it cannot load the requested url? Python
I am trying to scrape LinkedIn, the script was working for 3 months but yesterday it crashed.我正在尝试抓取 LinkedIn,该脚本运行了 3 个月,但昨天它崩溃了。
I use selenium webdriver, Firefox with fake useragent.我使用 selenium webdriver,Firefox 和假用户代理。
The URL is https://www.linkedin.com/company/my_company/
URL 是https://www.linkedin.com/company/my_company/
def init_driver():
"""Initiates selenium webdriver.
:return: Firefox browser instance
"""
try:
# use random UserAgent to avoid captcha
fp = webdriver.FirefoxProfile()
fp.set_preference("general.useragent.override", UserAgent().random)
fp.update_preferences()
# initiate driver
options = FirefoxOptions()
#options.add_argument("--headless")
return webdriver.Firefox(firefox_options=options)
except Exception as e:
logging.error('Exception occurred initiating webdriver', exc_info=True)
And then just open a page driver.get(url)然后只需打开一个页面 driver.get(url)
at this moment it opens it but cannot load此时它打开它但无法加载
the same situation happens without fake agent and by using chrome.在没有假代理和使用 chrome 的情况下也会发生同样的情况。
Has anyone encountered something like this?有没有人遇到过这样的事情? When I open the link myself everything os ok.
当我自己打开链接时,一切正常。
https://www.linkedin.com/authwall?trk=gf&trkInfo=AQFvPeNP8NQIxwAAAXLqc-uI5rnQe1ZIysPcZOgjZCzbrBHZj7q6gd68fPG9NzbX00Rlre_yC0tITChjMDEXSNnD8tZRaMXqcRG-z_3QUMlCvQPR4uVGBQYoSOl3ycoO2E6Jl9w=&originalReferer=&sessionRedirect=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2my_company%2F
Other URLs are opened without problems by the function function 打开其他 URL 没有问题
This is how you should modify your code.这就是你应该如何修改你的代码。
I modified your code and your code was executed correctly.我修改了您的代码,并且您的代码已正确执行。
from selenium import webdriver
from fake_useragent import UserAgent
import logging
def init_driver():
"""Initiates selenium webdriver.
:return: Firefox browser instance
"""
path = r"your firefox driver path"
try:
# use random UserAgent to avoid captcha
fp = webdriver.FirefoxProfile()
fp.set_preference("general.useragent.override", UserAgent().random)
fp.update_preferences()
# initiate driver
options = webdriver.FirefoxOptions()
# options.add_argument("--headless")
return webdriver.Firefox(firefox_options=options, executable_path=path)
except Exception:
logging.error('Exception occurred initiating webdriver', exc_info=True)
url = "your url"
driver = init_driver()
driver.get(url)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.