简体   繁体   English

如何获取下载按钮的url并读取Python中的CSV文件?

[英]How to get the url of download button and read the CSV file in Python?

I am using Python Google Colab and trying to read the csv file from this link: https://www.macrotrends.net/stocks/charts/AAPL/apple/stock-price-history我正在使用 Python Google Colab 并尝试从此链接读取 csv 文件: https://www.macrotrends/applestock-price-history/charts

If you scroll little bit down, you will be able to see download button.如果您向下滚动一点,您将能够看到下载按钮。 I'd like to get the link by using selenium or bs and read the csv file.我想通过使用 selenium 或 bs 获取链接并阅读 csv 文件。 I am trying to do something like this,我正在尝试做这样的事情,

# install packages
!pip install selenium
!apt-get update # to update ubuntu to correctly run apt install
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin

# load packages
import pandas as pd
from selenium import webdriver
import sys

# run selenium and read the csv file
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
driver.get('https://www.macrotrends.net/stocks/charts/AAPL/apple/stock-price-history')#put here the adress of your page
btn = driver.find_element_by_tag_name('button')
btn.click()
df = pd.read_csv('##.csv')

It seems to be working until btn.click() part but getting error after as it doesn't tell me the link of the download button nor the file name.它似乎一直在工作,直到btn.click()部分,但之后出现错误,因为它没有告诉我下载按钮的链接或文件名。 Could you please assist?你能帮忙吗? That would be much appreciated.那将不胜感激。

No need for selenium.不需要 selenium。 The data is embedded in the <script> tags.数据嵌入在<script>标记中。

import requests
from bs4 import BeautifulSoup
import json
import pandas as pd

t = 'AAPL'
url = 'https://www.macrotrends.net/assets/php/stock_price_history.php?t={}'.format(t)

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

scripts = soup.find_all('script',{'type':'text/javascript'})
for script in scripts:
    if 'var dataDaily' in str(script):
        jsonStr = '[' + str(script).split('[',1)[-1].split('];')[0] + ']'
        jsonData = json.loads(jsonStr)
        
df = pd.DataFrame(jsonData)
df = df.rename(columns={'o':'open','h':'high','l':'low','c':'close','d':'date','v':'volume'})
df.to_csv('MacroTrends_Data_Download_{}.csv'.format(t), index=False)

Output: Output:

print(df)
             date      open      high  ...   volume     ma50    ma200
0      1980-12-12    0.1012    0.1016  ...  469.034      NaN      NaN
1      1980-12-15    0.0964    0.0964  ...  175.885      NaN      NaN
2      1980-12-16    0.0893    0.0893  ...  105.728      NaN      NaN
3      1980-12-17    0.0910    0.0915  ...   86.442      NaN      NaN
4      1980-12-18    0.0937    0.0941  ...   73.450      NaN      NaN
          ...       ...       ...  ...      ...      ...      ...
10135  2021-02-25  124.6800  126.4585  ...  148.200  131.845  112.241
10136  2021-02-26  122.5900  124.8500  ...  164.560  131.838  112.460
10137  2021-03-01  123.7500  127.9300  ...  116.308  131.840  112.716
10138  2021-03-02  128.4100  128.7200  ...  102.261  131.790  112.957
10139  2021-03-03  124.8100  125.7100  ...  111.514  131.661  113.184

[10140 rows x 8 columns]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM