简体   繁体   English

尝试通过 class 抓取 HTML 跨度值,但返回错误

[英]Trying to scrape an HTML span value by class, but returns error

def getDOW():
    DowURL = ["https://finance.yahoo.com/quote/%5EDJI?p=^DJI"]
    # requests data on the website(s) above
    page = requests.get(DowURL, headers=headers)

    # parses HTML text from website
    soup = BeautifulSoup(page.content, "html.parser")

    # title = soup.find(class_="D(ib) Fz(18px)").get_text()
    name = soup.find(class_= "Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)").get_text()
    print (name)

Returns the error:返回错误:

raise InvalidSchema("No connection adapters were found for '%s' % url")
requests.exceptions.InvalidSchema: No connection adapters were found for '['https://finance.yahoo.com/quote/%5EDJI?p=^DJI']'

I'm sorry if this was already asked before, but I'm new to BeautifulSoup so I wasn't sure what was going on.很抱歉,如果之前已经问过这个问题,但我是 BeautifulSoup 的新手,所以我不确定发生了什么。 If anybody could help me, or advise me, that would be much appreciated.如果有人可以帮助我或建议我,将不胜感激。 I am trying to scrape values from several sites, failed to do so using a list, so made separate functions (I know how redundant this is) and ran into this error.我正在尝试从多个站点抓取值,但未能使用列表执行此操作,因此创建了单独的函数(我知道这是多么多余)并遇到此错误。

When you look at the HTML source of the page, you will find, that the object you are interested in, is not present.当您查看页面的 HTML 源代码时,您会发现您感兴趣的 object 不存在。 The reason likely is, that the content is loaded only once you load the page in a browser.原因可能是,只有在浏览器中加载页面后才会加载内容。 You can use Selenium and Chrome driver to do this.您可以使用Selenium 和 Chrome 驱动程序来执行此操作。 However, it is not very efficient to load the data like this.但是,像这样加载数据并不是很有效。 I did this in the past and it was not a good solution.我过去这样做过,这不是一个好的解决方案。

Since you seem to be interested in stock prices, you could use this approach:由于您似乎对股票价格感兴趣,您可以使用这种方法:

import yfinance as yf
import datetime

start = datetime.datetime(2019,11,15)
end = datetime.datetime(2019,11,16)
data = yf.download('^DJI', start=start, end=end)
print(data)

Result:结果:

[*********************100%***********************]  1 of 1 downloaded
                Open      High       Low     Close  Adj Close     Volume
Date
2019-11-14  27757.20  27800.71  27676.97  27781.96   27781.96  303970000
2019-11-15  27843.54  28004.89  27843.54  28004.89   28004.89  283720000

I would do it like this.我会这样做。

import datetime
import pandas as pd
import numpy as np
import pylab as pl
import datetime
from sklearn.cluster import AffinityPropagation
from sklearn import metrics
from matplotlib.collections import LineCollection
from pandas_datareader import data as wb
from sklearn import cluster, covariance, manifold


start = '2019-02-01'
end = '2020-02-01'

tickers = ['DJIA']

thelen = len(tickers)

price_data = []
for ticker in tickers:
    prices = wb.DataReader(ticker, start = start, end = end, data_source='yahoo')[['Open','Adj Close']]
    price_data.append(prices.assign(ticker=ticker)[['ticker', 'Open', 'Adj Close']])

#names = np.reshape(price_data, (len(price_data), 1))

names = pd.concat(price_data)
names.reset_index()

Result:结果:

          Date ticker          Open     Adj Close
0   2019-02-01   DJIA  25025.310547  25063.890625
1   2019-02-04   DJIA  25062.119141  25239.369141
2   2019-02-05   DJIA  25287.929688  25411.519531
3   2019-02-06   DJIA  25371.570312  25390.300781
4   2019-02-07   DJIA  25265.810547  25169.529297
..         ...    ...           ...           ...
247 2020-01-27   DJIA  28542.490234  28535.800781
248 2020-01-28   DJIA  28594.279297  28722.849609
249 2020-01-29   DJIA  28820.529297  28734.449219
250 2020-01-30   DJIA  28640.160156  28859.439453
251 2020-01-31   DJIA  28813.039062  28256.029297

[252 rows x 4 columns]

Note: you can pass in any tickers you want, so change this line:注意:你可以传入任何你想要的代码,所以改变这一行:

tickers = ['DJIA']

To this:对此:

tickers = ['MMM',
'ABT',
'ABBV',
'ABMD',
'ACN',
'ATVI']

And you will get data for multiple tickers, like this.你会得到多个代码的数据,就像这样。

Result:结果:

           Date ticker        Open   Adj Close
0    2019-02-01    MMM  201.490005  190.939163
1    2019-02-04    MMM  199.009995  191.945831
2    2019-02-05    MMM  200.720001  192.818268
3    2019-02-06    MMM  201.220001  194.208420
4    2019-02-07    MMM  201.020004  191.591080
        ...    ...         ...         ...
1507 2020-01-27   ATVI   58.189999   58.320000
1508 2020-01-28   ATVI   58.790001   60.029999
1509 2020-01-29   ATVI   60.490002   60.400002
1510 2020-01-30   ATVI   59.919998   60.889999
1511 2020-01-31   ATVI   60.200001   58.480000

[1512 rows x 4 columns]

刮<div<span from html-page< div><div id="text_translate"><p> 我正在尝试使用 Eclipse 中的 Python 创建一个简单的天气预报。 到目前为止,我已经写了这个:</p><pre> from bs4 import BeautifulSoup import requests def weather_forecast(): url = 'https://www.yr.no/nb/v%C3%A6rvarsel/daglig-tabell/1-92416/Norge/Vestland/Bergen/Bergen' r = requests.get(url) # Get request for contents of the page print(r.content) # Outputs HTML code for the page soup = BeautifulSoup(r.content, 'html5lib') # Parse the data with BeautifulSoup(HTML-string, html-parser) min_max = soup.select('min-max.temperature') # Select all spans with a "min-max-temperature" attribute print(min_max.prettify()) table = soup.find('div', attrs={'daily-weather-list-item__temperature'}) print(table.prettify())</pre><p> 从具有如下元素的 html 页面:</p><p> <a href="https://i.stack.imgur.com/liV2d.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/liV2d.png" alt=""></a></p><p> 我在 HTML 页面的元素中找到了第一个温度的路径,但是当我尝试执行我的代码并打印以查看我是否正确完成时,没有打印任何内容。 我的目标是打印一张带有日期和相应温度的表格,这似乎是一项简单的任务,但我不知道如何正确命名属性或如何在一次迭代中从 HTML 页面中将它们全部刮掉。</p><p> &lt;span 存储了两个温度,一个最小值和一个最大值,这里只是碰巧它们是相同的。</p><p> 我想将 go 放入每个 &lt;div class="daily-weather-list-item__temperature" 中,收集两个温度并将它们添加到字典中,我该怎么做?</p><p> 我已经在 stackoverflow 上查看了这个问题,但我无法弄清楚: <a href="https://stackoverflow.com/questions/53084902/python-beautifulsoup-scraping-div-spans-and-p-tags-also-how-to-get-exact-mat" rel="nofollow noreferrer">Python BeautifulSoup - Scraping Div Spans 和 p 标签 - 以及如何在 div 名称上获得完全匹配</a></p></div></div<span> - Scrape <div<span from HTML-page

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 试图刮跨度 - Trying to Scrape a Span 如何从隐藏跨度 class HTML 中抓取链接? - how to scrape links from hidden span class HTML? 使用beautifulsoup python在span类HTML中刮取值 - Scrape values inside span class HTML with beautifulsoup python 如何抓取具有相同 class 值的跨度类的文本? - How to scrape text of span classes that have the same class value? 如何从“span”内的 html“类”中获取/抓取所有元素? - How to fetch/scrape all elements from a html “class” which is inside “span”? 如果存在相同 class 名称的跨度,如何刮擦跨度 class 文本? - how to scrape the span class text if there are span of same class name? 刮<div<span from html-page< div><div id="text_translate"><p> 我正在尝试使用 Eclipse 中的 Python 创建一个简单的天气预报。 到目前为止,我已经写了这个:</p><pre> from bs4 import BeautifulSoup import requests def weather_forecast(): url = 'https://www.yr.no/nb/v%C3%A6rvarsel/daglig-tabell/1-92416/Norge/Vestland/Bergen/Bergen' r = requests.get(url) # Get request for contents of the page print(r.content) # Outputs HTML code for the page soup = BeautifulSoup(r.content, 'html5lib') # Parse the data with BeautifulSoup(HTML-string, html-parser) min_max = soup.select('min-max.temperature') # Select all spans with a "min-max-temperature" attribute print(min_max.prettify()) table = soup.find('div', attrs={'daily-weather-list-item__temperature'}) print(table.prettify())</pre><p> 从具有如下元素的 html 页面:</p><p> <a href="https://i.stack.imgur.com/liV2d.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/liV2d.png" alt=""></a></p><p> 我在 HTML 页面的元素中找到了第一个温度的路径,但是当我尝试执行我的代码并打印以查看我是否正确完成时,没有打印任何内容。 我的目标是打印一张带有日期和相应温度的表格,这似乎是一项简单的任务,但我不知道如何正确命名属性或如何在一次迭代中从 HTML 页面中将它们全部刮掉。</p><p> &lt;span 存储了两个温度,一个最小值和一个最大值,这里只是碰巧它们是相同的。</p><p> 我想将 go 放入每个 &lt;div class="daily-weather-list-item__temperature" 中,收集两个温度并将它们添加到字典中,我该怎么做?</p><p> 我已经在 stackoverflow 上查看了这个问题,但我无法弄清楚: <a href="https://stackoverflow.com/questions/53084902/python-beautifulsoup-scraping-div-spans-and-p-tags-also-how-to-get-exact-mat" rel="nofollow noreferrer">Python BeautifulSoup - Scraping Div Spans 和 p 标签 - 以及如何在 div 名称上获得完全匹配</a></p></div></div<span> - Scrape <div<span from HTML-page 如何用同一类刮掉另一个跨度 - how to scrape the other span with same class 如何在另一个跨度 class 内刮掉一个跨度? - How to Scrape one of the span inside another span class? <span>使用 python 网页抓取</span>获取<span>值</span> - Get a <span> value using python web scrape
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM