简体   繁体   English

如何使用 Python 从调色板网站 web 刮取调色板?

[英]How to web scrape color palettes from a color palette website with Python?

I'm trying to web scrape a list of color palettes with their rgb values from a color palette design website.我正在尝试 web 从调色板设计网站上抓取带有 rgb 值的调色板列表。 The HTML code looks like this for each color palette:每个调色板的 HTML 代码如下所示:

<div class = "item block shadow">    
  <div class="palette">
       <div class="place c4" style="background-color: rgb(34, 14, 36);">...</div>
       <div class="place c3" style="background-color: rgb(52, 32, 86);">...</div>
       <div class="place c2" style="background-color: rgb(84, 84, 197);">...</div>
       <div class="place c1" style="background-color: rgb(99, 156, 217);">...</div>

When I web scraped this information on Python, the output didn't show the "style=..." part:当我 web 在 Python 上抓取此信息时,output 没有显示“style=...”部分:

[<div class="palette">
<div class="place c4"><a href=""></a><span></span></div>
<div class="place c3"><a href=""></a><span></span></div>
<div class="place c2"><a href=""></a><span></span></div>
<div class="place c1"><a href=""></a><span></span></div>
</div>]

Is there a way to extract the information I'm looking for?有没有办法提取我正在寻找的信息? Thanks in advance.提前致谢。

edit: here's my code编辑:这是我的代码

import requests
from bs4 import BeautifulSoup

page = requests.get('https://colorhunt.co/palettes/popular')
soup = BeautifulSoup(page.text, 'html.parser')
repo = soup.find(class_="item block shadow")
repo_list = repo.find_all(class_='palette')

The problem here is not that you scrape the wrong part, it's that you scrape the page before it's done loading.This page is https://colorhunt.co/ btw.这里的问题不是你刮错了部分,而是你在页面加载完成之前刮掉了页面。这个页面是https://colorhunt.co/btw

When you run this command page = requests.get('https://colorhunt.co/palettes/popular') .当您运行此命令时page = requests.get('https://colorhunt.co/palettes/popular') What you're getting is a raw HTML file with no JS execution yet.Take a look at script tag of the html you're getting.你得到的是一个未执行 JS 的原始 HTML 文件。看看你得到的 html 的script标签。 You'll see lot of JS code is for rendering elements especially in the palette class sections you're trying to get.您会看到很多 JS 代码用于渲染元素,尤其是在您尝试获取的palette class 部分中。 Thus, all of the colors are not in your scraped content.因此,所有 colors 都不在您的抓取内容中。

Solution: use a library that simulate as if you're browsing the web from the code, run the webpage, wait for it to finish, when it is done, scrape.解决方案:使用模拟的库,就好像您正在从代码中浏览 web 一样,运行网页,等待它完成,完成后,抓取。 I learn JS so my choice is puppeteer , there is also a python equivalent called pyppeteer but I think there are more libraries like these.我学习 JS,所以我的选择是puppeteer ,还有一个 python 等价物,称为pyppeteer但我认为还有更多这样的库。

Hope this answer will help although it's been over a year since you asked.希望这个答案会有所帮助,尽管你问已经一年多了。 I ran into this problem and no legitimate answer is found anywhere so this will probably save some others' time.我遇到了这个问题,在任何地方都找不到合法的答案,所以这可能会节省其他人的时间。 Better be late than nothing:)迟到总比没有好:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM