简体   繁体   中英

Trouble in scraping specific element from a page using beautifulsoup python

I am new to python and looking into scraping HTML using python beautifulsoup library.

I need to fetch date field value as Day and date and precip field value as well as measuring unit .

Python code

  dates=[] Precip=[] for row in right_table.findAll("tr"): cells = row.findAll('td') th_cells=row.findAll('th') #To store second column data if len(cells)==5: Precip.append(cells[1].find(text=True)) dates.append(th_cells[0].find(text=True)) print(dates) print(Precip) 

Code Output

['Wed ', 'Thu ', 'Fri ', 'Sat ', 'Sun ', 'Mon ', 'Tue ', 'Wed ', 'Thu ', 'Fri ', 'Sat ', 'Sun ', 'Mon ', 'Tue ', 'Wed ', 'Thu ', 'Fri ', 'Sat ', 'Sun ', 'Mon ', 'Tue ', 'Wed ', 'Thu ', 'Fri ', 'Sat ', 'Sun ', 'Mon ', 'Tue ', 'Wed ', 'Thu ']
['0 ', '0 ', '0 ', '1 ', '3 ', '3 ', '13 ', '0 ', '0 ', '0 ', '0 ', '0 ', '\xa0', '1 ', '3 ', '0 ', '1 ', '4 ', '2 ', '9 ', '2 ', '0 ', '1 ', '0 ', '0 ', '0 ', '0 ', '0 ', '1 ', '2 ']

Required Output

['Wed 11/1','Thur 11/2'.......]

['0mm','0mm'....]

Below is the HTML which i am trying to parse

HTML

 <class 'list'>: ['\\n', <thead> <tr> <th>Date</th> <th>Hi/Lo</th> <th>Precip</th> <th>Snow</th> <th>Forecast</th> <th>Avg. HI / LO</th> </tr> </thead>, '\\n', <tbody> <tr class="pre"> <th scope="row">Wed <time>11/1</time></th> <td>25°/20°</td> <td>0 <span class="small">mm</span></td> <td>0 <span class="small">CM</span></td> <td> </td> <td>28°/18°</td> </tr> <tr class="pre"> <th scope="row">Thu <time>11/2</time></th> <td>28°/19°</td> <td>0 <span class="small">mm</span></td> <td>0 <span class="small">CM</span></td> <td> </td> <td>27°/18°</td> </tr> 

I'd use .text instead of .find(text=true) . What's currently happening is you're not fetching the content of the subtags, like <time> .

from bs4 import BeautifulSoup
import requests

html = requests.get("https://www.accuweather.com/en/in/bengaluru/204108/month/204108?view=table").text
soup = BeautifulSoup(html, 'html.parser')



right_table = soup.find("tbody")
dates=[]
Precip=[]
for row in right_table.findAll("tr"):
    cells = row.findAll('td')
    th_cells=row.findAll('th') #To store second column data
    if len(cells)==5:
        Precip.append(cells[1].text)
        dates.append(th_cells[0].text)
print(dates)
print(Precip)

This gets the correct outputted result:

['Wed 11/1', 'Thu 11/2', 'Fri 11/3', 'Sat 11/4', 'Sun 11/5', 'Mon 11/6', 'Tue 11/7', 'Wed 11/8', 'Thu 11/9', 'Fri 11/10', 'Sat 11/11', 'Sun 11/12', 'Mon 11/13', 'Tue 11/14', 'Wed 11/15', 'Thu 11/16', 'Fri 11/17', 'Sat 11/18', 'Sun 11/19', 'Mon 11/20', 'Tue 11/21', 'Wed 11/22', 'Thu 11/23', 'Fri 11/24', 'Sat 11/25', 'Sun 11/26', 'Mon 11/27', 'Tue 11/28', 'Wed 11/29', 'Thu 11/30']
['0 mm', '0 mm', '0 mm', '1 mm', '3 mm', '3 mm', '13 mm', '0 mm', '0 mm', '0 mm', '0 mm', '0 mm', '\xa0', '1 mm', '3 mm', '0 mm', '1 mm', '4 mm', '2 mm', '9 mm', '2 mm', '0 mm', '1 mm', '0 mm', '0 mm', '0 mm', '0 mm', '0 mm', '1 mm', '2 mm']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM