簡體   English   中英

使用 python 從 bseindia 下載 csv 文件

[英]Downloading csv file from bseindia using python

我想從'https://www.bseindia.com/corporates/Forth_Results.aspx'下載Results.csv 我想基本上以dataframe格式獲取數據。 我使用下面的代碼來下載文件,但它得到了一些錯誤數據。

import requests
import pandas as pd
bse_url = 'https://www.bseindia.com/corporates/Forth_Results.aspx'
r = requests.get(bse_url)
file_name = Results.csv

with open(file_name, 'wb') as f:
    for chunk in r.iter_content(): 
        f.write(chunk)
        f.flush()

您可以在 selenium 的幫助下執行此操作,請按照以下步驟操作:

第 1 步:下載適用於 chrome 的 web 驅動程序:

首先檢查您的 chrome 版本(瀏覽器菜單(三個垂直點)-> 幫助-> 關於 Google Chrome

第2步:根據您的chrome瀏覽器版本從這里下載驅動程序(我的是81.0.4044.138)

第 3 步:下載后解壓縮文件並將chromedriver.exe放在腳本所在的目錄中。

第 4 步: pip install selenium

現在使用下面的代碼:

from selenium import webdriver
import os
import pandas as pd

#your website url
site = 'https://www.bseindia.com/corporates/Forth_Results.aspx'

#your driver path
driver = webdriver.Chrome(executable_path = 'chromedriver.exe')
#passing website url
driver.get(site)

#wait until whole sites load
time.sleep(5)

#click download icon using xpath
driver.find_element_by_xpath("/html/body/div[1]/form/div[4]/div/div[2]/div/div/div[2]/a/i").click()
#closing browser
driver.close()
#reading Results.csv from defalut download directory
df = pd.read_csv("c:/users/viupadhy/downloads/Results.csv")
df

Output:

    Security Code   Security Name   Company name    Result Date
0   542579  AGOL    Ashapuri Gold Ornament Ltd  24 Jul 2020
1   500425  AMBUJACEM   AMBUJA CEMENTS LTD. 24 Jul 2020
2   531223  ANJANI  ANJANI SYNTHETICS LTD.-$    24 Jul 2020
3   500820  ASIANPAINT  ASIAN PAINTS LTD.   24 Jul 2020
4   500027  ATUL    ATUL LTD.   24 Jul 2020
5   512063  AYOME   AYOKI MERCANTILE LTD.   24 Jul 2020
6   517246  BCCFUBA BCC FUBA INDIA LTD. 24 Jul 2020
7   540700  BRNL    Bharat Road Network Ltd 24 Jul 2020
8   519600  CCL CCL PRODUCTS (INDIA) LTD.   24 Jul 2020
9   531621  CENTERAC    CENTERAC TECHNOLOGIES LTD.  24 Jul 2020
10  539991  CFEL    Confidence Futuristic Energetech Ltd    24 Jul 2020
11  500110  CHENNPETRO  CHENNAI PETROLEUM CORPORATION LTD.  24 Jul 2020
12  534691  COMCL   COMFORT COMMOTRADE LTD. 24 Jul 2020
13  531216  COMFINTE    COMFORT INTECH LTD.-$   24 Jul 2020
14  526829  CONFIPET    CONFIDENCE PETROLEUM INDIA LTD. 24 Jul 2020
15  506395  COROMANDEL  COROMANDEL INTERNATIONAL LTD.   24 Jul 2020
16  539876  CROMPTON    Crompton Greaves Consumer Electricals Ltd   24 Jul 2020
17  526269  CRSTCHM CRESTCHEM LTD.  24 Jul 2020
18  541546  GAYAHWS Gayatri Highways Ltd    24 Jul 2020
19  500171  GHCL    GHCL LTD.   24 Jul 2020
20  524590  HEMORGANIC  Hemo Organic Limited    24 Jul 2020
21  505725  HINDEVER    HINDUSTAN EVEREST TOOLS LTD.    24 Jul 2020
22  501295  IITL    INDUSTRIAL INVESTMENT TRUST LTD.    24 Jul 2020
23  513295  IMEC    Imec Services Ltd   24 Jul 2020
24  541300  INDINFR IndInfravit Trust   24 Jul 2020
25  500875  ITC ITC LTD.    24 Jul 2020
26  509715  JAYSHREETEA JAY SHREE TEA & INDUSTRIES LTD. 24 Jul 2020
27  500228  JSWSTEEL    JSW STEEL LTD.  24 Jul 2020
28  506184  KANANIIND   KANANI INDUSTRIES LTD.  24 Jul 2020
29  512036  KAPILCO KAPIL COTEX LTD.    24 Jul 2020
... ... ... ... ...
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import os
import time
import pandas as pd
#PATH CHECK
import pathlib

while 1 == 1 :  # This constructs an infinite loop
    filename='C:/Users/Administrator/Downloads/Results.csv'
    file = pathlib.Path(filename)
    if file.exists ():
        os.remove('C:/Users/Administrator/Downloads/Results.csv')
    #your website url
    site = 'https://www.bseindia.com/corporates/Forth_Results.aspx'

    #your driver path
    driver = webdriver.Chrome(executable_path = 'chromedriver.exe')
    #passing website url
    driver.get(site)
    time.sleep(10)
    wait = WebDriverWait(driver, 20)
    wait.until(EC.presence_of_element_located((By.ID, 'ContentPlaceHolder1_lnkDownload')))
    
    #click download icon using xpath
    el=driver.find_element_by_xpath("/html/body/div[1]/form/div[4]/div/div[2]/div/div/div[2]/a/i")
    el.click()
    #elem.click()
    time.sleep(20)
    driver.close()
    if file.exists ():
        break

df = pd.read_csv("C:/Users/Administrator/Downloads/Results.csv")
print(df)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM