Scraping data using BeautifulSoup

Question

I'm trying scrape the data into a dictionary from this site,

from bs4 import BeautifulSoup 
import requests 
from pprint import pprint

page = requests.get('https://webscraper.io/') 
soup = BeautifulSoup(page.text, "lxml")

info = []
for x in range(1,7):
    items = soup.findAll("div",{"class":f"info{x}"})
    info.append(items)

however, the HTML tags are not being removed.

Answer 1

Something like this might work? (Replace the webscraper.io url with your actual request URL; Also, you'd still need to clean up the \\n characters from the output):

from bs4 import BeautifulSoup 
import requests 
from pprint import pprint

page = requests.get('https://webscraper.io/') 
soup = BeautifulSoup(page.text, "lxml")

info = []
for x in range(1,7):
    items = soup.findAll("div",{"class":f"info{x}"})
    info += [item.text for item in items]

Ie item.text, and concatenate the resulting array with info

Answer 2

You need to use .text . Then to get in the way you want, would need to do a bit of string manipulation.

from bs4 import BeautifulSoup 
import requests 
from pprint import pprint

url = 'https://webscraper.io/'
page = requests.get(url) 
soup = BeautifulSoup(page.text, "lxml")


info = []
for x in range(1,7):
    item = soup.find("div",{"class":"info%s" %x}).text.strip().replace('\n',': ')
    info.append(item)

info = '\n'.join(info)
print (info)

Scraping data using BeautifulSoup

Question

2 answers

solution1
1 2020-01-08 13:12:12

solution2
1 ACCPTED 2020-01-08 13:52:28

Scraping data using BeautifulSoup

Question

2 answers

solution1 1 2020-01-08 13:12:12

solution2 1 ACCPTED 2020-01-08 13:52:28

solution1
1 2020-01-08 13:12:12

solution2
1 ACCPTED 2020-01-08 13:52:28