简体   繁体   English

使用美丽的汤 python 从标签内部提取 html ID

[英]extract a html ID from inside a tag using beautiful soup python

I am trying to extract only an iid code in html so that i can append it to a url and open the page I need.我正在尝试仅提取 html 中的 iid 代码,以便我可以将 append 提取到 url 并打开我需要的页面。

I can find the tag I need by specifying the class of the tag.我可以通过指定标签的 class 来找到我需要的标签。 However I also get 4 other tags in the output.但是我还在 output 中获得了 4 个其他标签。 All i want is the iid inside the first tag "183988596953"我想要的只是第一个标签“183988596953”中的 iid

I have tried using this code to specify only the idd我尝试使用此代码仅指定 idd

rslt_table = soup.find_all("iid",{"div class": "lvpic pic img left"})

This however only seems to return an empty list []然而,这似乎只返回一个空列表 []

The output i get when repacing the line of code above with 2nd last line of code below is the output with 4 tags I mentioned我在用下面的最后一行代码重新调整上面的代码行时得到的 output 是 output,带有我提到的 4 个标签

from bs4 import BeautifulSoup
import requests
import re

urls = ['https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=goldfinger+quad']

#https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=

def find_id(urls):
    for url in urls:
        session = requests.session()
        response = session.get(url)
        #soup = BeautifulSoup(response.content, "lxml")
        soup = BeautifulSoup(response.content, "html.parser")
        rslt_table = soup.find("div", {"class": "lvpic pic img left"})
        return(rslt_table)

My search url is https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=goldfinger+quad ' My search url is https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=goldfinger+quad '

Full outpt is全输出是

<div class="lvpic pic img left" iid="183988596953">
<div class="lvpicinner full-width picW">
<a class="img imgWr2" href="https://www.ebay.co.uk/itm/GOLDFINGER-1964-Style-A-B-UK-Cinema-High-Quality-Repro-30-x-40-quad-poster/183988596953?hash=item2ad69330d9:g:rYQAAOSwrENdbmEW">
<img alt='GOLDFINGER 1964 Style A &amp; B -  UK Cinema High Quality Repro 30"x 40" quad poster' class="img" src="https://i.ebayimg.com/thumbs/images/g/rYQAAOSwrENdbmEW/s-l225.jpg"/>
</a>
</div></div>

Your code updated:您的代码已更新:

  • Use attrs to return all the attributes使用attrs返回所有属性
    • {'class': ['lvpic', 'pic', 'img', 'left'], 'iid': '183988596953'}
def find_id(urls):
    for url in urls:
        session = requests.session()
        response = session.get(url)
        soup = BeautifulSoup(response.content, "html.parser")
        return soup.find("div", {"class": "lvpic pic img left"}).attrs['iid']

iid = find_id(urls)

print(iid)

>>> '183988596953'

If you want all iid :如果你想要所有iid

def find_id(urls):
    for url in urls:
        session = requests.session()
        response = session.get(url)
        soup = BeautifulSoup(response.content, "html.parser")
        div = s.find_all("div", attrs={'class': 'lvpic pic img left'})
        return [iid.attrs['iid'] for iid in div]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM