[英]How to get the url of download button and read the CSV file in Python?
I am using Python Google Colab and trying to read the csv file from this link: https://www.macrotrends.net/stocks/charts/AAPL/apple/stock-price-history我正在使用 Python Google Colab 并尝试从此链接读取 csv 文件: https://www.macrotrends/applestock-price-history/charts
If you scroll little bit down, you will be able to see download button.如果您向下滚动一点,您将能够看到下载按钮。 I'd like to get the link by using selenium or bs and read the csv file.
我想通过使用 selenium 或 bs 获取链接并阅读 csv 文件。 I am trying to do something like this,
我正在尝试做这样的事情,
# install packages
!pip install selenium
!apt-get update # to update ubuntu to correctly run apt install
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
# load packages
import pandas as pd
from selenium import webdriver
import sys
# run selenium and read the csv file
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
driver.get('https://www.macrotrends.net/stocks/charts/AAPL/apple/stock-price-history')#put here the adress of your page
btn = driver.find_element_by_tag_name('button')
btn.click()
df = pd.read_csv('##.csv')
It seems to be working until btn.click()
part but getting error after as it doesn't tell me the link of the download button nor the file name.它似乎一直在工作,直到
btn.click()
部分,但之后出现错误,因为它没有告诉我下载按钮的链接或文件名。 Could you please assist?你能帮忙吗? That would be much appreciated.
那将不胜感激。
No need for selenium.不需要 selenium。 The data is embedded in the
<script>
tags.数据嵌入在
<script>
标记中。
import requests
from bs4 import BeautifulSoup
import json
import pandas as pd
t = 'AAPL'
url = 'https://www.macrotrends.net/assets/php/stock_price_history.php?t={}'.format(t)
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
scripts = soup.find_all('script',{'type':'text/javascript'})
for script in scripts:
if 'var dataDaily' in str(script):
jsonStr = '[' + str(script).split('[',1)[-1].split('];')[0] + ']'
jsonData = json.loads(jsonStr)
df = pd.DataFrame(jsonData)
df = df.rename(columns={'o':'open','h':'high','l':'low','c':'close','d':'date','v':'volume'})
df.to_csv('MacroTrends_Data_Download_{}.csv'.format(t), index=False)
Output: Output:
print(df)
date open high ... volume ma50 ma200
0 1980-12-12 0.1012 0.1016 ... 469.034 NaN NaN
1 1980-12-15 0.0964 0.0964 ... 175.885 NaN NaN
2 1980-12-16 0.0893 0.0893 ... 105.728 NaN NaN
3 1980-12-17 0.0910 0.0915 ... 86.442 NaN NaN
4 1980-12-18 0.0937 0.0941 ... 73.450 NaN NaN
... ... ... ... ... ... ...
10135 2021-02-25 124.6800 126.4585 ... 148.200 131.845 112.241
10136 2021-02-26 122.5900 124.8500 ... 164.560 131.838 112.460
10137 2021-03-01 123.7500 127.9300 ... 116.308 131.840 112.716
10138 2021-03-02 128.4100 128.7200 ... 102.261 131.790 112.957
10139 2021-03-03 124.8100 125.7100 ... 111.514 131.661 113.184
[10140 rows x 8 columns]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.