美丽的汤循环在 HTML 中的 div 元素上

Question

I am attempting to use Beautiful Soup to extract some values out of a web page (not very much wisdom here..) which are hourly values from a weatherbug forecast .我正在尝试使用 Beautiful Soup 从网页中提取一些值（这里不是很多智慧..），这些值是来自 Weatherbug 预测的每小时值。 In Chrome developer mode I can see the values are nested within the div classes as shown in the snip below:在 Chrome 开发者模式下，我可以看到值嵌套在div类中，如下面的截图所示：

In Python I can attempt to mimic a web browser and find these values:在 Python 中，我可以尝试模仿 Web 浏览器并找到这些值：

import requests
import bs4 as BeautifulSoup
import pandas as pd
from bs4 import BeautifulSoup

url = 'https://www.weatherbug.com/weather-forecast/hourly/san-francisco-ca-94103'

header = {
  "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
  "X-Requested-With": "XMLHttpRequest"
}

page = requests.get(url, headers=header)

soup = BeautifulSoup(page.text, 'html.parser')

With the code below, I can find 12 of these hour-card_mobile_cond div classes which seems about correct as when searching for hourly forecast I can see 12 hours/variables of future data.使用下面的代码，我可以找到这些hour-card_mobile_cond div 类中的 12 个，这似乎是正确的，因为在搜索每小时预测时我可以看到 12 小时/未来数据的变量。 Im not sure why I am picking up a mobile device method to view...(?)我不知道为什么我要选择一种移动设备方法来查看......（？）

temp_containers = soup.find_all('div', class_ = 'hour-card__mobile__cond')
print(type(temp_containers))
print(len(temp_containers))

Output:输出：

<class 'bs4.element.ResultSet'>
12

I am doing something incorrect below if I attempt to make up some code to loop thru all these div classes to dive down a little further.. I can 12 empty lists returned.. Would anyone have a tip at all where I can improve?如果我尝试编写一些代码来循环遍历所有这些 div 类以进一步深入，我在下面做一些不正确的事情..我可以返回 12 个空列表.. 有没有人有我可以改进的提示？ Ultimately I am looking to put all 12 future hourly forecasted values into a pandas dataframe.最终，我希望将所有 12 个未来每小时预测值放入熊猫数据框中。

for div in temp_containers:
    a = div.find_all('div', class_ = 'temp ng-binding')
    print(a)

EDIT, complete code based on answer with pandas dataframe编辑，基于熊猫数据框的答案的完整代码

import requests
from bs4 import BeautifulSoup
import pandas as pd


r = requests.get(
    "https://www.weatherbug.com/weather-forecast/hourly/san-francisco-ca-94103")
soup = BeautifulSoup(r.text, 'html.parser')

stuff = []

for item in soup.select("div.hour-card__mobile__cond"):
    item = int(item.contents[1].get_text(strip=True)[:-1])
    print(item)
    stuff.append(item)


df = pd.DataFrame(stuff)
df.columns = ['temp']

Answer 1

The website is loaded via JavaScript dynamically once the page loads.一旦页面加载，网站就会通过JavaScript动态加载。 so you can use requests-html or selenium .所以你可以使用requests-html或selenium 。

from selenium import webdriver
from selenium.webdriver.firefox.options import Options

options = Options()
options.add_argument('--headless')
driver = webdriver.Firefox(options=options)

driver.get(
    "https://www.weatherbug.com/weather-forecast/hourly/san-francisco-ca-94103")


data = driver.find_elements_by_css_selector("div.temp.ng-binding")

for item in data:
    print(item.text)

driver.quit()

Output:输出：

51°

52°

53°

54°

53°

53°

52°

51°

51°

50°

50°

49°

Updated per user-request:根据用户请求更新：

import requests
from bs4 import BeautifulSoup

r = requests.get(
    "https://www.weatherbug.com/weather-forecast/hourly/san-francisco-ca-94103")
soup = BeautifulSoup(r.text, 'html.parser')

for item in soup.select("div.hour-card__mobile__cond"):
    item = int(item.contents[1].get_text(strip=True)[:-1])
    print(item, type(item))

Output:输出：

51 <class 'int'>
52 <class 'int'>
53 <class 'int'>
53 <class 'int'>
53 <class 'int'>
53 <class 'int'>
52 <class 'int'>
51 <class 'int'>
51 <class 'int'>
50 <class 'int'>
50 <class 'int'>
50 <class 'int'>

Answer 2

When you see class = "temp ng-binding" this means the div has class of "temp" and of "ng-binding" so looking for both won't work.当您看到 class = "temp ng-binding" 时，这意味着 div 具有 "temp" 和 "ng-binding" 类，因此查找两者都不起作用。 Also, when I ran your script, the html of the temp containers, looked like this:此外，当我运行您的脚本时，临时容器的 html 如下所示：

print(temp_containers[0])

<div class="temp">
                    51°
</div>

so I ran this and got results所以我运行了这个并得到了结果

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = 'https://www.weatherbug.com/weather-forecast/hourly/san-francisco-ca-94103'

header = {
  "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
  "X-Requested-With": "XMLHttpRequest"
}

page = requests.get(url, headers=header)

soup = BeautifulSoup(page.text, 'html.parser')

temp_containers = soup.find_all('div', class_ = 'hour-card__mobile__cond')
print(type(temp_containers))
print(len(temp_containers))

for div in temp_containers:
    a = div.find('div', class_ = 'temp')
    print(a.text)

美丽的汤循环在 HTML 中的 div 元素上

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-03-17 19:15:32

解决方案2
0 2020-03-17 19:23:44

美丽的汤循环在 HTML 中的 div 元素上

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-03-17 19:15:32

解决方案2 0 2020-03-17 19:23:44

解决方案1
1 已采纳 2020-03-17 19:15:32

解决方案2
0 2020-03-17 19:23:44