简体   繁体   English

提取BeautifulSoup中div的属性值

[英]Extracting Attribute Values of a div in BeautifulSoup

I am trying to scrape the, 'project' value (the url)from the following.我正在尝试从以下内容中获取“项目”值(url)。 Is it possible to do?有可能吗? Any guidance will be highly appreciated.任何指导将不胜感激。

<div class="js-project-group">
<div class="grid-row flex flex-wrap">
<div class="js-react-proj-card grid-col-12 grid-col-6-sm grid-col-4-lg" data-pid="564032676" data-project='{
           "id":564032676,
           "name":"SONOFF NSPanel Smart Scene Wall Switch",
           "urls":{"web":{"project":"https://www.kickstarter.com/projects/sonoffnspanel/sonoff-nspanel-smart-scene-wall-switch"}}}'>

           SOME CONTENT 
</div>
<div>SAME ALIKE DIV AS ABOVE</div>
</div> #closing div of grid-row flex flex-wrap
<div>SAME ALIKE DIV AS THE FIRST</div>
</div>

EDIT My try:编辑我的尝试:

projectlist = soup.find_all('div', class_='js-project-group')
projectdata =[]
for project in projectlist:
    tag = project.find('div', class_="js-react-proj-card grid-col-12 grid-col-6-sm grid-col-4-lg")
    attribute = tag['data-project']
    projectdata.append(attribute)

It just gets only the project data in the first div.它只获取第一个 div 中的项目数据。 What I have done wrong?我做错了什么?

EDIT Entire HTML structure:编辑整个 HTML 结构:

<div class="js-project-group" >
  <div class="grid-row flex flex-wrap">
    <div class="js-react-proj-card></div>
    <div class="js-react-proj-card></div>
    <div class="js-react-proj-card></div>
  </div>
  <div class="grid-row flex flex-wrap">
    <div class="js-react-proj-card></div>
    <div class="js-react-proj-card></div>
    <div class="js-react-proj-card></div>
  </div>
</div>


You should definitely invest a little more time in your questioning, so that the scenario is comprehensible and reproducible for everyone - next time you will do it better - right?你绝对应该在你的提问上多花点时间,这样每个人都可以理解和重现这个场景——下次你会做得更好——对吗? -> Check this documentation of minimal reproducible example -> 检查最小可重现示例的文档

What happens?怎么了?

You try to scrape all projects but still selected the container.您尝试抓取所有项目但仍然选择了容器。 That means for your loop, that it iterates only one time.这意味着对于您的循环,它只迭代一次。

How to fix?怎么修?

Select all the projects in the container and iterate: Select 容器中的所有项目并迭代:

soup.select('div.js-project-group div.js-react-proj-card.grid-col-12.grid-col-6-sm.grid-col-4-lg')

To get the url you can json.loads() your attribut and navigate to the project url:要获取 url,您可以json.loads()您的属性并导航到项目 url:

json.loads(attribute)['urls']['web']['project']

Example (based on category technology / electronic )示例(基于类别技术/电子

import requests
import json

url = "https://www.kickstarter.com/discover/categories/technology/diy%20electronics"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'}

response = requests.get(url, headers=headers)

soup = BeautifulSoup(response.content,'lxml')



projectdata =[]
for project in soup.select('div.js-project-group div.js-react-proj-card.grid-col-12.grid-col-6-sm.grid-col-4-lg'):
    attribute = project['data-project']
    projectdata.append(json.loads(attribute)['urls']['web']['project'])
projectdata

Output Output

['https://www.kickstarter.com/projects/albertgajsak/circuitmess-batmobile',
 'https://www.kickstarter.com/projects/udoo/udoo-key-the-4-ai-platform',
 'https://www.kickstarter.com/projects/sonoffnspanel/sonoff-nspanel-smart-scene-wall-switch',
 'https://www.kickstarter.com/projects/olman/smart-trays',
 'https://www.kickstarter.com/projects/sbcshop1/pico-led-cube-on-raspberry-pi-pico',
 'https://www.kickstarter.com/projects/fluffee-anion-brush/fluffee-anion-pet-grooming-brush',
 'https://www.kickstarter.com/projects/172204344/ninja-counter-an-arduino-and-raspberry-pi-programmable-timer',
 'https://www.kickstarter.com/projects/amritsingh/pico-air-monitoring-expansion-measure-pm-level-with-pico',
 'https://www.kickstarter.com/projects/eddietay/retro-dreamer-g4a-cm4-by-my-retro-game-case',
 'https://www.kickstarter.com/projects/1405172335/self-powered-rotating-pumpkin-pedestal',
 'https://www.kickstarter.com/projects/udoo/udoo-android-linux-arduino-in-a-tiny-single-board',
 'https://www.kickstarter.com/projects/udoo/udoo-neo-raspberry-pi-arduino-wi-fi-bt-40-sensors']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM