[英]Can't scrape from a specific website using python requests
我正在嘗試從下面的URL中抓取,但它並沒有帶來我使用瀏覽器訪問時看到的內容(來自公共客戶案例/故事的內容)。 我也嘗試過模擬帶有標題的真實瀏覽器,但到目前為止還沒有。 有什么建議給我嗎?
URL: https://customers.microsoft.com/en-us/story/767633-asos-retailer-azure-active-directory-m365
import requests
main_url = "https://customers.microsoft.com/en-us/story/767633-asos-retailer-azure-active-directory-m365"
result = requests.get(main_url)
print(result.text)
它使用外部 API 來獲取數據。 您只需要撥打電話:
GET https://customers.microsoft.com/en-us/api/search?key=STORY_KEY
STORY_KEY
是767633-asos-retailer-azure-active-directory-m365
例如 url 中最后一個斜線之后的文本。 您可以使用python腳本,如下所示:
import requests
url = "https://customers.microsoft.com/en-us/story/767633-asos-retailer-azure-active-directory-m365"
r = requests.get(
"https://customers.microsoft.com/en-us/api/search",
params = {
"key": url.rsplit('/', 1)[1]
}
)
document = r.json()["search_document"]
summary = document["story_exec_summary"]
body = document["story_body_text_2"]
quote1 = document["story_quote_carousel"]
quote2 = document["story_quote_carousel_2"]
print(summary)
print(body)
print(quote1)
print(quote2)
請注意,您需要在document
object(視頻、body3 等...)中搜索您要查找的數據
您需要正確處理證書。 這將需要額外的軟件包:
pip install certifi
pip install urllib3
而我們需要使用不同的python庫,即urllib3
python
Python 3.7.7 (default, Mar 10 2020, 15:43:33)
[Clang 11.0.0 (clang-1100.0.33.17)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> import certifi
>>> import urllib3
>>>
>>> http = urllib3.PoolManager(cert_reqs='CERT_REQUIRED', ca_certs=certifi.where())
>>> main_url = "https://customers.microsoft.com/en-us/story/767633-asos-retailer-azure-active-directory-m365"
>>>
>>> r = http.request('GET', main_url)
>>> r.status
200
>>> r.data
>>> open("stories.html", "wb").write(r.data)
Output:
>>> r.data
b'\r\n<!doctype html>\r\n<html lang="en" xml:lang="en" dir="ltr">\r\n<head prefix="og: http://ogp.me/ns#">\r\n <meta charset="utf-8" />\r\n <meta name="viewport" content="width=device-width, initial-scale=1.0" />\r\n <meta name="description" content="Microsoft customer stories. See how Microsoft tools help companies run their business.">\r\n <meta name="keywords" content="Microsoft, customers, stories, business, software, tools, services, use case, global, collaboration, vendor, story sear .....
讓我知道這是否有幫助。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.