简体   繁体   English

Python 像浏览器一样请求?

[英]Python requests like a browser?

I want to get a webdocument from 'https://www.fanfiction.net/s/5218118/1/', but I am sadly unable to replicate the behaviour of my browser - the server always sends me something along the lines of "please enable cookies" or "complete Captcha".我想从“https://www.fanfiction.net/s/5218118/1/”获取网络文档,但遗憾的是我无法复制浏览器的行为——服务器总是向我发送类似“请启用 cookie”或“完成验证码”。 Is there a way to send requests like a browser, so the server delivers me the same document as if I am a browser?有没有办法像浏览器一样发送请求,所以服务器会像我是浏览器一样向我提供相同的文档? I already googled and tried to integrate cookies and a fake Useragent.我已经用谷歌搜索并尝试集成 cookies 和假用户代理。 Here is my code:这是我的代码:

import requests
from fake_useragent import UserAgent

url = 'https://www.fanfiction.net/s/5218118/1/'

ua = UserAgent()
S = requests.Session()

header = {'User-Agent':str(ua.chrome)}
res = S.get(url, headers=header)
cookies = dict(res.cookies)


response = S.get(url, headers=header, cookies=cookies)

Thanks already in advance: EDIT, I know that I could use selenium, but I do not want to always update my chromedriver.提前感谢:编辑,我知道我可以使用 selenium,但我不想总是更新我的 chromedriver。 and also I do not want to waste performance on selenium.而且我不想在硒上浪费性能。

Saw your EDIT but just in case, ...看到你的编辑,但以防万一,......

Simple example with selenium, that give you the storytext selenium的简单示例,为您提供故事文本

from selenium import webdriver
from bs4 import BeautifulSoup


browser = webdriver.Chrome('C:\Program Files\ChromeDriver\chromedriver.exe')
browser.get('https://www.fanfiction.net/s/5218118/1/')

soup=BeautifulSoup(browser.page_source, 'lxml')

print(soup.select_one('#storytext').get_text())

browser.close()

Edit编辑

Edited based on your question and the fact that the site is protected by cloudflare to avoid ddos attacks.根据您的问题以及该站点受 cloudflare 保护以避免 ddos 攻击的事实进行编辑。

You could extract the tag texts by selenium, but as in example above, I use beautifulsoup您可以通过 selenium 提取标签文本,但如上例所示,我使用beautifulsoup

You are right, inspect the html with developer tools and the tag part looks like this:没错,用开发者工具检查html ,标签部分如下所示:

 <span class="xgray xcontrast_txt"> Rated: <a class="xcontrast_txt" href="https://www.fictionratings.com/" target="rating">Fiction T</a> - English - Romance/Adventure - Naruto U., Hinata H. - Chapters: 6 - Words: 14,894 - Reviews: <a href="/r/13747729/">5</a> - Favs: 29 - Follows: 24 - Updated: <span data-xutime="1610096566">33m ago</span> - Published: <span data-xutime="1605552788">Nov 16, 2020</span> - id: 13747729 </span>

A span with the classes xgray xcontrast_txt so we select it like that:具有xgray xcontrast_txt类的span ,所以我们 select 就像这样:

tags = soup.select_one('span.xgray.xcontrast_txt').get_text(strip=True)

You may wanna know more about beautifulsoup? 您可能想了解更多关于 beautifulsoup 的信息?

Example例子

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup


browser = webdriver.Chrome('C:\Program Files\ChromeDriver\chromedriver.exe')
browser.get('https://www.fanfiction.net/s/5218118/4/Yet-again-with-a-little-extra-help')

try:
    # wait until certain element with id 'storytext' showed up
    element = WebDriverWait(browser, 10).until(
        EC.presence_of_element_located((By.ID, 'storytext'))
    )
    
    soup=BeautifulSoup(browser.page_source, 'lxml')

    storytext = soup.select_one('#storytext').get_text()
    tags = soup.select_one('span.xgray.xcontrast_txt').get_text(strip=True)
    
    print(tags)
    print(storytext)
    
finally:
    browser.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM