Trouble in scraping specific element from a page using beautifulsoup python

Question

I am new to python and looking into scraping HTML using python beautifulsoup library.

I need to fetch date field value as Day and date and precip field value as well as measuring unit .

Python code

  dates=[] Precip=[] for row in right_table.findAll("tr"): cells = row.findAll('td') th_cells=row.findAll('th') #To store second column data if len(cells)==5: Precip.append(cells[1].find(text=True)) dates.append(th_cells[0].find(text=True)) print(dates) print(Precip)

Code Output

['Wed ', 'Thu ', 'Fri ', 'Sat ', 'Sun ', 'Mon ', 'Tue ', 'Wed ', 'Thu ', 'Fri ', 'Sat ', 'Sun ', 'Mon ', 'Tue ', 'Wed ', 'Thu ', 'Fri ', 'Sat ', 'Sun ', 'Mon ', 'Tue ', 'Wed ', 'Thu ', 'Fri ', 'Sat ', 'Sun ', 'Mon ', 'Tue ', 'Wed ', 'Thu ']
['0 ', '0 ', '0 ', '1 ', '3 ', '3 ', '13 ', '0 ', '0 ', '0 ', '0 ', '0 ', '\xa0', '1 ', '3 ', '0 ', '1 ', '4 ', '2 ', '9 ', '2 ', '0 ', '1 ', '0 ', '0 ', '0 ', '0 ', '0 ', '1 ', '2 ']

Required Output

['Wed 11/1','Thur 11/2'.......]

['0mm','0mm'....]

Below is the HTML which i am trying to parse

HTML

 <class 'list'>: ['\\n', <thead> <tr> <th>Date</th> <th>Hi/Lo</th> <th>Precip</th> <th>Snow</th> <th>Forecast</th> <th>Avg. HI / LO</th> </tr> </thead>, '\\n', <tbody> <tr class="pre"> <th scope="row">Wed <time>11/1</time></th> <td>25°/20°</td> <td>0 <span class="small">mm</span></td> <td>0 <span class="small">CM</span></td> <td> </td> <td>28°/18°</td> </tr> <tr class="pre"> <th scope="row">Thu <time>11/2</time></th> <td>28°/19°</td> <td>0 <span class="small">mm</span></td> <td>0 <span class="small">CM</span></td> <td> </td> <td>27°/18°</td> </tr>

Answer 1

I'd use .text instead of .find(text=true) . What's currently happening is you're not fetching the content of the subtags, like <time> .

from bs4 import BeautifulSoup
import requests

html = requests.get("https://www.accuweather.com/en/in/bengaluru/204108/month/204108?view=table").text
soup = BeautifulSoup(html, 'html.parser')



right_table = soup.find("tbody")
dates=[]
Precip=[]
for row in right_table.findAll("tr"):
    cells = row.findAll('td')
    th_cells=row.findAll('th') #To store second column data
    if len(cells)==5:
        Precip.append(cells[1].text)
        dates.append(th_cells[0].text)
print(dates)
print(Precip)

This gets the correct outputted result:

['Wed 11/1', 'Thu 11/2', 'Fri 11/3', 'Sat 11/4', 'Sun 11/5', 'Mon 11/6', 'Tue 11/7', 'Wed 11/8', 'Thu 11/9', 'Fri 11/10', 'Sat 11/11', 'Sun 11/12', 'Mon 11/13', 'Tue 11/14', 'Wed 11/15', 'Thu 11/16', 'Fri 11/17', 'Sat 11/18', 'Sun 11/19', 'Mon 11/20', 'Tue 11/21', 'Wed 11/22', 'Thu 11/23', 'Fri 11/24', 'Sat 11/25', 'Sun 11/26', 'Mon 11/27', 'Tue 11/28', 'Wed 11/29', 'Thu 11/30']
['0 mm', '0 mm', '0 mm', '1 mm', '3 mm', '3 mm', '13 mm', '0 mm', '0 mm', '0 mm', '0 mm', '0 mm', '\xa0', '1 mm', '3 mm', '0 mm', '1 mm', '4 mm', '2 mm', '9 mm', '2 mm', '0 mm', '1 mm', '0 mm', '0 mm', '0 mm', '0 mm', '0 mm', '1 mm', '2 mm']

Trouble in scraping specific element from a page using beautifulsoup python

Question

1 answers

solution1
3 ACCPTED 2017-11-14 06:39:49

Trouble in scraping specific element from a page using beautifulsoup python

Question

1 answers

solution1 3 ACCPTED 2017-11-14 06:39:49

solution1
3 ACCPTED 2017-11-14 06:39:49