简体   繁体   中英

Get the text from a div tag in html with bs4 python

I have a website and I wan't to pull the text form a div tag with bs4 using an external website. and this is a flask website

#Importing librarys 
from flask import Flask, render_template 
import sys
import json
import requests
import urllib.request
import time
from bs4 import BeautifulSoup


#Importing files and class from other python files in the project
sys.path.append('.')
from webScrape import getInformation

#Making a new app instance
app = Flask(__name__)

#Saying if the app is on route / the open index.html
@app.route('/')
def index():
    URL = 'https://covidstat.info/home'

    HTML = requests.get(URL)
    soup = BeautifulSoup(HTML.text, "html.parser")
    tag = soup.findAll('div', {'class': 'count'})
    print(tag.text)
    return render_template('index.html', tag=tag)

#Running the app on port 5000
if __name__== '__main__':
    app.run(debug=True, host='0.0.0.0',) 

Oh and I have another question anyone know how I can get an element using xpath in bs4

With soup.findAll you will return a list of divs in this case. For this reason you have to access them individually in a loop. You can also use a list comprehension like this:

tag_text = [t.text for t in tag]

Which returns: ['2,735,342', '2,025,878', '329,757', '442', '4', '2,615,920']

Alternatively, you can use soup.find instead, which will just return the first div, and you could access it directly by tag.text which will give '2,735,342' .

To get the element by xpath is to use the inspector, by right-clicking on the text you want -> Inspect Element -> right click on the div-tag -> Copy -> XPath .

The xpath for the number used before would be:

/html/body/div[1]/div/div[2]/div[2]/div/div/div[2]/div/div[1]/div/div[1]/div/div/div/ul/li[1]/div[2]

As of my knowledge, BS4 does not support xpath selection, so you'ld have to change to another library. I know Selenium supports it, but would probably not be the best use-case for this task.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM