简体   繁体   中英

how to get html text in <strong> tag using python

I have tried multiple methods to no avail.

I have this simple html that I want to extract the number 373 and then do some division.

 <span id="ctl00_cph1_lblRecCount">Records Found: <strong> 373</strong></span> 

I attempted to get the number with this python script below

import requests
from bs4 import BeautifulSoup
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
import urllib3
import re




NSNpreviousAwardRef = "https://www.dibbs.bsm.dla.mil/Awards/AwdRecs.aspx?Category=nsn&TypeSrch=cq&Value="+NSN+"&Scope=all&Sort=nsn&EndDate=&StartDate=&lowCnt=&hiCnt="                   

                NSNdriver.get(NSNpreviousAwardRef)


                previousAwardSoup = BeautifulSoup(NSNdriver.page_source,"html5lib");

                            # parsing of table
                try:
                    totalPrevAward = previousAwardSoup.find("span", {"id": "ctl00_cph1_lblRecCount"}).strong.text
                    awardpagetotala = float(totalPrevAward) / (50)
                    awardpagetotal = math.ceil(awardpagetotala)+1
                    print(date)
                    print("total previous awards: "+ str(totalPrevAward))
                    print("page total : "+ str(awardpagetotal))
                except Exception as e:
                    print(e) 
                    continue

all I get is this error

'NoneType' object has no attribute 'strong'

I tried parse the html as lxml and still the same error. What am I doing wrongly and how can I fix it

The code to access the strong tag, soup.find("span").strong , is perfectly right. You can explicitly try it by putting that html line in a variable, and creating your BeautifulSoup object from that variable.

Now, the error clearly tells you that the span tag you're looking for does not exist. So here are some potential sources of the problem, off the top of my head:

  • Are you sure of the html input you feed into BeautifulSoup to create previousAwardSoup ?
  • Are you sure that the id attribute is correct? More specifically, is it always the same and not randomized?

打印您的previousAwardSoup,并检查它是否具有您要搜索的span标签。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM