简体   繁体   English

使用beautifulsoup python从页面抓取特定元素时遇到问题

[英]Trouble in scraping specific element from a page using beautifulsoup python

I am new to python and looking into scraping HTML using python beautifulsoup library. 我是python的新手,正在研究使用python beautifulsoup库抓取HTML。

I need to fetch date field value as Day and date and precip field value as well as measuring unit . 我需要获取日期字段值作为Day和date和precip字段值以及度量单位。

Python code Python代码

  dates=[] Precip=[] for row in right_table.findAll("tr"): cells = row.findAll('td') th_cells=row.findAll('th') #To store second column data if len(cells)==5: Precip.append(cells[1].find(text=True)) dates.append(th_cells[0].find(text=True)) print(dates) print(Precip) 

Code Output 代码输出

['Wed ', 'Thu ', 'Fri ', 'Sat ', 'Sun ', 'Mon ', 'Tue ', 'Wed ', 'Thu ', 'Fri ', 'Sat ', 'Sun ', 'Mon ', 'Tue ', 'Wed ', 'Thu ', 'Fri ', 'Sat ', 'Sun ', 'Mon ', 'Tue ', 'Wed ', 'Thu ', 'Fri ', 'Sat ', 'Sun ', 'Mon ', 'Tue ', 'Wed ', 'Thu ']
['0 ', '0 ', '0 ', '1 ', '3 ', '3 ', '13 ', '0 ', '0 ', '0 ', '0 ', '0 ', '\xa0', '1 ', '3 ', '0 ', '1 ', '4 ', '2 ', '9 ', '2 ', '0 ', '1 ', '0 ', '0 ', '0 ', '0 ', '0 ', '1 ', '2 ']

Required Output 所需输出

['Wed 11/1','Thur 11/2'.......]

['0mm','0mm'....]

Below is the HTML which i am trying to parse 以下是我要解析的HTML

HTML HTML

 <class 'list'>: ['\\n', <thead> <tr> <th>Date</th> <th>Hi/Lo</th> <th>Precip</th> <th>Snow</th> <th>Forecast</th> <th>Avg. HI / LO</th> </tr> </thead>, '\\n', <tbody> <tr class="pre"> <th scope="row">Wed <time>11/1</time></th> <td>25°/20°</td> <td>0 <span class="small">mm</span></td> <td>0 <span class="small">CM</span></td> <td> </td> <td>28°/18°</td> </tr> <tr class="pre"> <th scope="row">Thu <time>11/2</time></th> <td>28°/19°</td> <td>0 <span class="small">mm</span></td> <td>0 <span class="small">CM</span></td> <td> </td> <td>27°/18°</td> </tr> 

I'd use .text instead of .find(text=true) . 我会使用.text而不是.find(text=true) What's currently happening is you're not fetching the content of the subtags, like <time> . 当前正在发生的事情是您没有获取子标签的内容,例如<time>

from bs4 import BeautifulSoup
import requests

html = requests.get("https://www.accuweather.com/en/in/bengaluru/204108/month/204108?view=table").text
soup = BeautifulSoup(html, 'html.parser')



right_table = soup.find("tbody")
dates=[]
Precip=[]
for row in right_table.findAll("tr"):
    cells = row.findAll('td')
    th_cells=row.findAll('th') #To store second column data
    if len(cells)==5:
        Precip.append(cells[1].text)
        dates.append(th_cells[0].text)
print(dates)
print(Precip)

This gets the correct outputted result: 这将获得正确的输出结果:

['Wed 11/1', 'Thu 11/2', 'Fri 11/3', 'Sat 11/4', 'Sun 11/5', 'Mon 11/6', 'Tue 11/7', 'Wed 11/8', 'Thu 11/9', 'Fri 11/10', 'Sat 11/11', 'Sun 11/12', 'Mon 11/13', 'Tue 11/14', 'Wed 11/15', 'Thu 11/16', 'Fri 11/17', 'Sat 11/18', 'Sun 11/19', 'Mon 11/20', 'Tue 11/21', 'Wed 11/22', 'Thu 11/23', 'Fri 11/24', 'Sat 11/25', 'Sun 11/26', 'Mon 11/27', 'Tue 11/28', 'Wed 11/29', 'Thu 11/30']
['0 mm', '0 mm', '0 mm', '1 mm', '3 mm', '3 mm', '13 mm', '0 mm', '0 mm', '0 mm', '0 mm', '0 mm', '\xa0', '1 mm', '3 mm', '0 mm', '1 mm', '4 mm', '2 mm', '9 mm', '2 mm', '0 mm', '1 mm', '0 mm', '0 mm', '0 mm', '0 mm', '0 mm', '1 mm', '2 mm']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM