简体   繁体   中英

Beautiful soup extraction of text within span tag

I am trying to extract the text Weight: 16.5 pounds from the following HTML:

<div class="product__description__text">.........
<p dir="ltr"><span><strong>Dimensions:</strong> 39 x 17.3 x 32.2 inches</span></p><p dir="ltr"><span><strong>Weight:</strong> 16.5 pounds</span></p><p dir="ltr"><span><strong>Weight limit:</strong> 35 pounds</span></p><p dir="ltr"><span><strong>Height limit:</strong>&nbsp;32 inches</span></p></div>

Here's what I've tried so far:

results = soup.find_all('div', attrs={'class':'product'})
Weight_L = []
for result in results:
    if result.find('p', attrs={'dir':'ltr'})is not None:
        weight = result.span.text
    Weight_L.append(weight)

If you are only finding weight , I would suggest you to only check if the keyword "weight" is in the p tag. Also, if you use find , it would only return the first result - so if the first p tag is not "Weight", you would not be able to find it. Also, if your class name is product__description__text , you should also change your finding class name to product__description__text .

results = soup.find_all('div', attrs={'class':'product__description__text'})
Weight_L = []
for result in results:
    p_tags = result.find_all('p', attrs={'dir':'ltr'})
    for tag in p_tags:
        if "Weight:" in tag.text:
            weight = tag.text
            Weight_L.append(weight)

If the above code you posted is soup , the result would be: ['Weight: 16.5 pounds']

The Weight: 16.5 pounds is in second p tags from parent class .product__description__text , you can get second p using p:nth-child(2)

results = soup.select(".product__description__text p:nth-child(2)")
Weight_L = []
for result in results:
  Weight_L.append(result.text)
  print(result.text)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM