[英]Web-scraping w/ Python: make my web scraping code faster?
I would like to scrape two tables from 2 links.我想从 2 个链接中抓取两个表。 My code is:
我的代码是:
import pandas as pd
import xlwings as xw
from datetime import datetime
def last_row(symbol, name):
# Function that outputs if the last row of the df should be deleted or not,
# based on the 2 requirements below.
requirements = [symbol.lower()=="total", name.isdigit()]
return all(requirements)
# return True, if the last row should be deleted.
# The deletion will be performed in the next function.
def get_foreigncompanies_info():
df_list = []
links = ["https://stockmarketmba.com/nonuscompaniesonusexchanges.php",
"https://stockmarketmba.com/listofadrs.php"]
for i in links:
#Reads table with pandas read_html and only save the necessary columns.
df = pd.read_html(i)[0][['Symbol', 'Name', 'GICS Sector']]
if last_row(df.iloc[-1]['Symbol'], df.iloc[-1]['Name']):
# Delete the last row
df_list.append(df.iloc[:-1])
else:
# Keep last row
df_list.append(df)
return pd.concat(df_list).reset_index(drop=True).rename(columns={'Name': 'Security'})
def open_in_excel(dataframe): # Code to view my df in excel.
xw.view(dataframe)
if __name__ == "__main__":
start = datetime.now()
df = get_foreigncompanies_info()
print(datetime.now() - start)
open_in_excel(get_foreigncompanies_info())
It took花了
seconds to perform the code.
秒来执行代码。
I would like to make the code run faster (in a way, that doesn't make too much unnecessary request).我想让代码运行得更快(在某种程度上,这不会产生太多不必要的请求)。 My idea is to download the table as csv, since in the website, there is a "download csv" button.
我的想法是将表格下载为 csv,因为在网站上,有一个“下载 csv”按钮。
How could I download the csv with python?如何下载带有 python 的 csv?
I have inspected the button but couldn't find the url for it.我检查了按钮,但找不到 url。 (If you can find it, please also describe how you found it perhaps with a "inspect"-screenshot.)
(如果你能找到它,也请描述你是如何找到它的,也许用“检查”截图。)
Or is there any other faster way to download the tables?或者有没有其他更快的方法来下载表格?
Thank you for any pointer:-)感谢您的任何指示:-)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.