簡體 English 中英

網站屏蔽了不被抓取的內容，如何在Python中抓取？

[英]How to crawl in Python while the website blocked contents not to be crawled?

原文 2020-03-05 08:54:06 9 1 python/ web-scraping/ beautifulsoup/ web-crawler

我是 Python 的初學者，並嘗試使用 BeautifulSoup 進行爬網。 並嘗試爬取一個網站來收集產品信息。

pr_url = soup.findAll("li", {"class", "_3FUicfNemK"})
pr_url

一切都與使用 BeautifulSoup 的其他爬網代碼相同。 但是即使我寫下了正確的組件，問題也沒有發生。

所以我認為是主機屏蔽了產品區域不被抓取。 因為除了區域之外的每個元素都是可爬行的。

你知道如何爬取這個被封鎖的區域嗎？ 網站網址是： https : //shopping.naver.com/living/homeliving/category?menu=10004487&sort=POPULARITY

提前感謝您的評論！

1 個解決方案

請注意，當您第一次加載頁面時，網站的輪廓會加載，但產品需要一段時間才能加載？ 這是因為該站點正在請求在后台加載其余內容。 此內容不會被阻止，它只是稍后加載:)

這里有 2 個選項 imo..

1）找出后台請求並將其傳遞給beautifulsoup。 使用 Chrome 開發工具網絡選項卡，我可以看到對產品的請求是...

https://shopping.naver.com/v1/products? NC = 1583366400000＆近垂直= HOME_LIVING＆頁= 1＆的pageSize = 10＆排序=普及＆濾波器= ALL＆顯示類型= CATEGORY_HOME＆includeZzim =真includeViewCount =真includeStoreCardInfo =真includeStockQuantity =假includeBrandInfo =假includeBrandLogoImage =假includeRepresentativeReview =假includeListCardAttribute =假includeRanking =假includeRankingByMenus =假includeStoreCategoryName =假菜單Id = 10004487＆standardSizeKeys =＆standardColorKeys =＆attributeValueIds =＆attributeValueIdsAll =＆認證=＆menuIds = &includeStoreInfoWithHighRatingReview=false

應該能夠在這里猜測對查詢字符串的調整並使用它。

2) 使用像 Selenium 這樣的工具，它可以與瀏覽器交互並為你執行任何 JavaScript，這樣你就不必弄清楚事情的那一面。 如果您不熟悉這些東西，那么這里的網絡技術學習曲線可能會少一些。

如何抓取網站正文內容以確認字符串是否存在

[英]How to crawl website body contents in order to confirm if string exists

如何在 Python BeautifulSoup 中抓取網站中的每個頁面

[英]How to crawl every page in a website in Python BeautifulSoup

如何使用Jupyter Notebook在Python中緩慢地抓取網站？

[英]How to crawl website slower in Python with Jupyter Notebook?

如何使用python抓取網站/將數據提取到數據庫中？

[英]How to crawl a website/extract data into database with python?

使用 python 爬取網站

[英]Use python to crawl a website

抓取抓取：抓取0頁

[英]Scrapy crawl: Crawled 0 pages

如何使用python中的Scrapy抓取網站以獲取網站中的所有鏈接？

[英]How to crawl a website to get all the links in a website using Scrapy in python?

如何抓取一個網站塊

[英]how to crawl for a block of a website

我該如何抓取所有 <td> 內容是什么？（python3.6）

[英]How can I crawl all the <td> contents?(python3.6)

在 Python 中檢測爬取網站中的文本語言

[英]Detecting the language of a text from a crawled website in Python

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 如何抓取網站正文內容以確認字符串是否存在如何在 Python BeautifulSoup 中抓取網站中的每個頁面如何使用Jupyter Notebook在Python中緩慢地抓取網站？如何使用python抓取網站/將數據提取到數據庫中？使用 python 爬取網站抓取抓取：抓取0頁如何使用python中的Scrapy抓取網站以獲取網站中的所有鏈接？如何抓取一個網站塊我該如何抓取所有 <td> 內容是什么？（python3.6）在 Python 中檢測爬取網站中的文本語言

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM