[英]Trying to scrape an HTML span value by class, but returns error
def getDOW():
DowURL = ["https://finance.yahoo.com/quote/%5EDJI?p=^DJI"]
# requests data on the website(s) above
page = requests.get(DowURL, headers=headers)
# parses HTML text from website
soup = BeautifulSoup(page.content, "html.parser")
# title = soup.find(class_="D(ib) Fz(18px)").get_text()
name = soup.find(class_= "Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)").get_text()
print (name)
Returns the error:返回错误:
raise InvalidSchema("No connection adapters were found for '%s' % url")
requests.exceptions.InvalidSchema: No connection adapters were found for '['https://finance.yahoo.com/quote/%5EDJI?p=^DJI']'
I'm sorry if this was already asked before, but I'm new to BeautifulSoup so I wasn't sure what was going on.很抱歉,如果之前已经问过这个问题,但我是 BeautifulSoup 的新手,所以我不确定发生了什么。 If anybody could help me, or advise me, that would be much appreciated.如果有人可以帮助我或建议我,将不胜感激。 I am trying to scrape values from several sites, failed to do so using a list, so made separate functions (I know how redundant this is) and ran into this error.我正在尝试从多个站点抓取值,但未能使用列表执行此操作,因此创建了单独的函数(我知道这是多么多余)并遇到此错误。
When you look at the HTML source of the page, you will find, that the object you are interested in, is not present.当您查看页面的 HTML 源代码时,您会发现您感兴趣的 object 不存在。 The reason likely is, that the content is loaded only once you load the page in a browser.原因可能是,只有在浏览器中加载页面后才会加载内容。 You can use Selenium and Chrome driver to do this.您可以使用Selenium 和 Chrome 驱动程序来执行此操作。 However, it is not very efficient to load the data like this.但是,像这样加载数据并不是很有效。 I did this in the past and it was not a good solution.我过去这样做过,这不是一个好的解决方案。
Since you seem to be interested in stock prices, you could use this approach:由于您似乎对股票价格感兴趣,您可以使用这种方法:
import yfinance as yf
import datetime
start = datetime.datetime(2019,11,15)
end = datetime.datetime(2019,11,16)
data = yf.download('^DJI', start=start, end=end)
print(data)
Result:结果:
[*********************100%***********************] 1 of 1 downloaded
Open High Low Close Adj Close Volume
Date
2019-11-14 27757.20 27800.71 27676.97 27781.96 27781.96 303970000
2019-11-15 27843.54 28004.89 27843.54 28004.89 28004.89 283720000
I would do it like this.我会这样做。
import datetime
import pandas as pd
import numpy as np
import pylab as pl
import datetime
from sklearn.cluster import AffinityPropagation
from sklearn import metrics
from matplotlib.collections import LineCollection
from pandas_datareader import data as wb
from sklearn import cluster, covariance, manifold
start = '2019-02-01'
end = '2020-02-01'
tickers = ['DJIA']
thelen = len(tickers)
price_data = []
for ticker in tickers:
prices = wb.DataReader(ticker, start = start, end = end, data_source='yahoo')[['Open','Adj Close']]
price_data.append(prices.assign(ticker=ticker)[['ticker', 'Open', 'Adj Close']])
#names = np.reshape(price_data, (len(price_data), 1))
names = pd.concat(price_data)
names.reset_index()
Result:结果:
Date ticker Open Adj Close
0 2019-02-01 DJIA 25025.310547 25063.890625
1 2019-02-04 DJIA 25062.119141 25239.369141
2 2019-02-05 DJIA 25287.929688 25411.519531
3 2019-02-06 DJIA 25371.570312 25390.300781
4 2019-02-07 DJIA 25265.810547 25169.529297
.. ... ... ... ...
247 2020-01-27 DJIA 28542.490234 28535.800781
248 2020-01-28 DJIA 28594.279297 28722.849609
249 2020-01-29 DJIA 28820.529297 28734.449219
250 2020-01-30 DJIA 28640.160156 28859.439453
251 2020-01-31 DJIA 28813.039062 28256.029297
[252 rows x 4 columns]
Note: you can pass in any tickers you want, so change this line:注意:你可以传入任何你想要的代码,所以改变这一行:
tickers = ['DJIA']
To this:对此:
tickers = ['MMM',
'ABT',
'ABBV',
'ABMD',
'ACN',
'ATVI']
And you will get data for multiple tickers, like this.你会得到多个代码的数据,就像这样。
Result:结果:
Date ticker Open Adj Close
0 2019-02-01 MMM 201.490005 190.939163
1 2019-02-04 MMM 199.009995 191.945831
2 2019-02-05 MMM 200.720001 192.818268
3 2019-02-06 MMM 201.220001 194.208420
4 2019-02-07 MMM 201.020004 191.591080
... ... ... ...
1507 2020-01-27 ATVI 58.189999 58.320000
1508 2020-01-28 ATVI 58.790001 60.029999
1509 2020-01-29 ATVI 60.490002 60.400002
1510 2020-01-30 ATVI 59.919998 60.889999
1511 2020-01-31 ATVI 60.200001 58.480000
[1512 rows x 4 columns]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.