簡體   English   中英

搜尋特定標簽和關鍵字,使用BeautifulSoup打印與之相關的信息

[英]Scraping specific tag and keyword, printing info associated with it using BeautifulSoup

我正在嘗試為藍寶石眼線筆產品刮取https://store.fabspy.com/collections/new-arrivals-beauty ,並返回與該產品ID相關的信息。 到目前為止,我有:

from bs4 import BeautifulSoup
import urllib2
url = 'https://store.fabspy.com/collections/new-arrivals-beauty'
page = BeautifulSoup(url.read())
soup = BeautifulSoup((page))
tag = 'div class="product-content"'
if row in soup.html.body.findAll(tag):
    data = row.findAll('id')
    if data and 'sapphire' in data[0].text:
        print data[4].text

我想接收的信息如下:

<div class="product-content">
    <div class="pc-inner"> 
      <div data-handle="clematis-dewdrop-sparkling-eye-pencil-g7454c-sapphire" 
           data-target="#quick-shop-popup"
           class="quick_shop quick-shop-button"
           data-toggle="modal"
           title="Quick View">
        <span>+ Quick View</span>
        <span class="json hide">
          {
            "id":8779050374,
            "title":"Clematis - Dewdrop Sparkling Gel Eye Liner Pencil # G7454C**Sapphire**",
            "handle":"clematis-dewdrop-sparkling-eye-pencil-g7454c-sapphire",
            "description":"\u003cdiv\u003e\r\n\r\nGel Formula, Rich Colour, Matte Finish, Long-Wearing, Safe for Waterline\r\n\r\n\u003cbr\u003e\n\u003c\/div\u003e\u003cdiv\u003e\u003cbr\u003e\u003c\/div\u003e \u003cimg alt=\"\" src=\"\/\/i.imgur.com\/adW5MKl.jpg\"\u003e",
            "published_at":"2016-10-17T20:15:40+08:00",
            "created_at":"2016-10-17T20:15:40+08:00",
            "vendor":"Clematis",
            "type":"Latest,Beauty,New,Makeup,Best, Clematis, Eyes",
            "tags":["Beauty","Best","Clematis","Eyes","Latest","Makeup","New"],
            "price":4900,
            "price_min":4900,
            "price_max":4900,
            "available":true,
            "price_varies":false,
            "compare_at_price":7900,
            "compare_at_price_min":7900,
            "compare_at_price_max":7900,
            "compare_at_price_varies":false,
            "variants":[{"id":31447937030", "title":"N\/A"]
          }

特別是末尾的id 請指定我的腳本應集中在哪個標簽上以檢索此信息,以及如何在該腳本及其id sapphire關鍵字搜索sapphire顏色,謝謝!

現有代碼中存在一些錯誤。 我建議使用requests而不是urllib2 我也在使用rejson庫。 因此,這就是我在您的情況下要做的(請閱讀代碼以獲取解釋)。

from bs4 import BeautifulSoup
import requests
import re
import json
# URL to scrape
url = 'https://store.fabspy.com/collections/new-arrivals-beauty'

# HTML data of the page
# You can add checks for 404 errors
soup = BeautifulSoup(requests.get(url).text, "lxml")

# Get a list of all elements having `sapphire` in the `data-handle` attribute
sapphire = soup.findAll(attrs={'data-handle': re.compile(r".*sapphire.*")})
# Take first element of this list (I checked, there is just one element)
sapphire = sapphire[0]

# Find class inside this element having JSON data. Taking just first element's text
json_text = sapphire.findAll(attrs={'class': "json"})[0].text

# Converting it to a dictionary
data = json.loads(json_text)
print data["id"]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM