![](/img/trans.png)
[英]Downloading csv file from web site using python BeautifulSoup
[英]Downloading csv file from bseindia using python
我想從'https://www.bseindia.com/corporates/Forth_Results.aspx'下載Results.csv 我想基本上以dataframe格式獲取數據。 我使用下面的代碼來下載文件,但它得到了一些錯誤數據。
import requests
import pandas as pd
bse_url = 'https://www.bseindia.com/corporates/Forth_Results.aspx'
r = requests.get(bse_url)
file_name = Results.csv
with open(file_name, 'wb') as f:
for chunk in r.iter_content():
f.write(chunk)
f.flush()
您可以在 selenium 的幫助下執行此操作,請按照以下步驟操作:
第 1 步:下載適用於 chrome 的 web 驅動程序:
首先檢查您的 chrome 版本(瀏覽器菜單(三個垂直點)-> 幫助-> 關於 Google Chrome
第2步:根據您的chrome瀏覽器版本從這里下載驅動程序(我的是81.0.4044.138)
第 3 步:下載后解壓縮文件並將chromedriver.exe放在腳本所在的目錄中。
第 4 步: pip install selenium
現在使用下面的代碼:
from selenium import webdriver
import os
import pandas as pd
#your website url
site = 'https://www.bseindia.com/corporates/Forth_Results.aspx'
#your driver path
driver = webdriver.Chrome(executable_path = 'chromedriver.exe')
#passing website url
driver.get(site)
#wait until whole sites load
time.sleep(5)
#click download icon using xpath
driver.find_element_by_xpath("/html/body/div[1]/form/div[4]/div/div[2]/div/div/div[2]/a/i").click()
#closing browser
driver.close()
#reading Results.csv from defalut download directory
df = pd.read_csv("c:/users/viupadhy/downloads/Results.csv")
df
Output:
Security Code Security Name Company name Result Date
0 542579 AGOL Ashapuri Gold Ornament Ltd 24 Jul 2020
1 500425 AMBUJACEM AMBUJA CEMENTS LTD. 24 Jul 2020
2 531223 ANJANI ANJANI SYNTHETICS LTD.-$ 24 Jul 2020
3 500820 ASIANPAINT ASIAN PAINTS LTD. 24 Jul 2020
4 500027 ATUL ATUL LTD. 24 Jul 2020
5 512063 AYOME AYOKI MERCANTILE LTD. 24 Jul 2020
6 517246 BCCFUBA BCC FUBA INDIA LTD. 24 Jul 2020
7 540700 BRNL Bharat Road Network Ltd 24 Jul 2020
8 519600 CCL CCL PRODUCTS (INDIA) LTD. 24 Jul 2020
9 531621 CENTERAC CENTERAC TECHNOLOGIES LTD. 24 Jul 2020
10 539991 CFEL Confidence Futuristic Energetech Ltd 24 Jul 2020
11 500110 CHENNPETRO CHENNAI PETROLEUM CORPORATION LTD. 24 Jul 2020
12 534691 COMCL COMFORT COMMOTRADE LTD. 24 Jul 2020
13 531216 COMFINTE COMFORT INTECH LTD.-$ 24 Jul 2020
14 526829 CONFIPET CONFIDENCE PETROLEUM INDIA LTD. 24 Jul 2020
15 506395 COROMANDEL COROMANDEL INTERNATIONAL LTD. 24 Jul 2020
16 539876 CROMPTON Crompton Greaves Consumer Electricals Ltd 24 Jul 2020
17 526269 CRSTCHM CRESTCHEM LTD. 24 Jul 2020
18 541546 GAYAHWS Gayatri Highways Ltd 24 Jul 2020
19 500171 GHCL GHCL LTD. 24 Jul 2020
20 524590 HEMORGANIC Hemo Organic Limited 24 Jul 2020
21 505725 HINDEVER HINDUSTAN EVEREST TOOLS LTD. 24 Jul 2020
22 501295 IITL INDUSTRIAL INVESTMENT TRUST LTD. 24 Jul 2020
23 513295 IMEC Imec Services Ltd 24 Jul 2020
24 541300 INDINFR IndInfravit Trust 24 Jul 2020
25 500875 ITC ITC LTD. 24 Jul 2020
26 509715 JAYSHREETEA JAY SHREE TEA & INDUSTRIES LTD. 24 Jul 2020
27 500228 JSWSTEEL JSW STEEL LTD. 24 Jul 2020
28 506184 KANANIIND KANANI INDUSTRIES LTD. 24 Jul 2020
29 512036 KAPILCO KAPIL COTEX LTD. 24 Jul 2020
... ... ... ... ...
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import os
import time
import pandas as pd
#PATH CHECK
import pathlib
while 1 == 1 : # This constructs an infinite loop
filename='C:/Users/Administrator/Downloads/Results.csv'
file = pathlib.Path(filename)
if file.exists ():
os.remove('C:/Users/Administrator/Downloads/Results.csv')
#your website url
site = 'https://www.bseindia.com/corporates/Forth_Results.aspx'
#your driver path
driver = webdriver.Chrome(executable_path = 'chromedriver.exe')
#passing website url
driver.get(site)
time.sleep(10)
wait = WebDriverWait(driver, 20)
wait.until(EC.presence_of_element_located((By.ID, 'ContentPlaceHolder1_lnkDownload')))
#click download icon using xpath
el=driver.find_element_by_xpath("/html/body/div[1]/form/div[4]/div/div[2]/div/div/div[2]/a/i")
el.click()
#elem.click()
time.sleep(20)
driver.close()
if file.exists ():
break
df = pd.read_csv("C:/Users/Administrator/Downloads/Results.csv")
print(df)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.