简体   繁体   中英

How do I get specific line of HTML from certain text parameter in Python Requests/Beautiful Soup

I am trying to scrape a website that has shoes. Each shoe size has a unique "variant" id. I need to figure out how to get that id based on what shoe size I want. An example of the HTML of the site is:

    <label for="variant_id_104685">43</label>

In this example, the shoe size is "43" I need to get that variant_id_104685 segment without already knowing it in advanced. In other words the input would be asking for size 43 and the output would return that variant id.

How should I go about doing that?

You can get the label element by text and then extract the for attribute value:

size = "43"
soup.find(attrs={"for": True}, text=size)["for"]

Demo:

In [1]: from bs4 import BeautifulSoup

In [2]: data = '<label for="variant_id_104685">43</label>'

In [3]: soup = BeautifulSoup(data, "html.parser")

In [4]: size = "43"

In [5]: soup.find(attrs={"for": True}, text=size)["for"]
Out[5]: 'variant_id_104685'

Note that the {"for": True} here helps to take into account only the label elements that have for attribute defined. You can usually have a more concise way to ask for presence of an attribute using a keyword argument like soup.find(attribute_name=True) , but in this case for is a reserved keyword and having something like soup.find(text=size, for=True)["for"] would result into a syntax error.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM