簡體   English   中英

無法使用 python 請求從特定網站抓取

[英]Can't scrape from a specific website using python requests

我正在嘗試從下面的URL中抓取,但它並沒有帶來我使用瀏覽器訪問時看到的內容(來自公共客戶案例/故事的內容)。 我也嘗試過模擬帶有標題的真實瀏覽器,但到目前為止還沒有。 有什么建議給我嗎?

URL: https://customers.microsoft.com/en-us/story/767633-asos-retailer-azure-active-directory-m365

import requests
main_url = "https://customers.microsoft.com/en-us/story/767633-asos-retailer-azure-active-directory-m365"
result = requests.get(main_url)   
print(result.text)

它使用外部 API 來獲取數據。 您只需要撥打電話:

GET https://customers.microsoft.com/en-us/api/search?key=STORY_KEY

STORY_KEY767633-asos-retailer-azure-active-directory-m365例如 url 中最后一個斜線之后的文本。 您可以使用腳本,如下所示:

import requests

url = "https://customers.microsoft.com/en-us/story/767633-asos-retailer-azure-active-directory-m365"

r = requests.get(
    "https://customers.microsoft.com/en-us/api/search",
    params = {
        "key": url.rsplit('/', 1)[1]
    }
)
document = r.json()["search_document"]

summary = document["story_exec_summary"]
body = document["story_body_text_2"]
quote1 = document["story_quote_carousel"]
quote2 = document["story_quote_carousel_2"]

print(summary)
print(body)
print(quote1)
print(quote2)

請注意,您需要在document object(視頻、body3 等...)中搜索您要查找的數據

您需要正確處理證書。 這將需要額外的軟件包:

pip install certifi
pip install urllib3

而我們需要使用不同的python庫,即urllib3

python
Python 3.7.7 (default, Mar 10 2020, 15:43:33)
[Clang 11.0.0 (clang-1100.0.33.17)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> import certifi
>>> import urllib3
>>>
>>> http = urllib3.PoolManager(cert_reqs='CERT_REQUIRED', ca_certs=certifi.where())
>>> main_url = "https://customers.microsoft.com/en-us/story/767633-asos-retailer-azure-active-directory-m365"
>>>
>>> r = http.request('GET', main_url)
>>> r.status
200
>>> r.data

>>> open("stories.html", "wb").write(r.data)

Output:

>>> r.data
b'\r\n<!doctype html>\r\n<html lang="en" xml:lang="en" dir="ltr">\r\n<head prefix="og: http://ogp.me/ns#">\r\n    <meta charset="utf-8" />\r\n    <meta name="viewport" content="width=device-width, initial-scale=1.0" />\r\n    <meta name="description" content="Microsoft customer stories. See how Microsoft tools help companies run their business.">\r\n    <meta name="keywords" content="Microsoft, customers, stories, business, software, tools, services, use case, global, collaboration, vendor, story sear .....

讓我知道這是否有幫助。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM