使用 Python 進行網頁抓取：讓我的 web 抓取代碼更快？

Question

我想從 2 個鏈接中抓取兩個表。 我的代碼是：

import pandas as pd
import xlwings as xw
from datetime import datetime

def last_row(symbol, name):

    # Function that outputs if the last row of the df should be deleted or not, 
    # based on the 2 requirements below.

    requirements = [symbol.lower()=="total", name.isdigit()]
    return all(requirements)
    
    # return True, if the last row should be deleted.
    # The deletion will be performed in the next function.

def get_foreigncompanies_info():
    df_list = []
    links = ["https://stockmarketmba.com/nonuscompaniesonusexchanges.php",
              "https://stockmarketmba.com/listofadrs.php"]
    for i in links:

        #Reads table with pandas read_html and only save the necessary columns.

        df = pd.read_html(i)[0][['Symbol', 'Name', 'GICS Sector']] 
        if last_row(df.iloc[-1]['Symbol'], df.iloc[-1]['Name']):

            # Delete the last row

            df_list.append(df.iloc[:-1])
        else:

            # Keep last row

            df_list.append(df)
    return pd.concat(df_list).reset_index(drop=True).rename(columns={'Name': 'Security'})

def open_in_excel(dataframe):  # Code to view my df in excel.
    xw.view(dataframe)
    
if __name__ == "__main__":
    start = datetime.now()
    df = get_foreigncompanies_info()
    print(datetime.now() - start)
    open_in_excel(get_foreigncompanies_info())

花了 秒來執行代碼。

我想讓代碼運行得更快（在某種程度上，這不會產生太多不必要的請求）。 我的想法是將表格下載為 csv，因為在網站上，有一個“下載 csv”按鈕。

如何下載帶有 python 的 csv？

我檢查了按鈕，但找不到 url。 （如果你能找到它，也請描述你是如何找到它的，也許用“檢查”截圖。）

或者有沒有其他更快的方法來下載表格？

感謝您的任何指示:-)

Answer 1

您可以使用selenium自動單擊按鈕。 對於如此微不足道的事情，這並不難，但需要付出很多努力。 我不喜歡刮，但有時這就是我們所擁有的，對吧？

使用 Python 進行網頁抓取：讓我的 web 抓取代碼更快？

問題描述

1 個解決方案

解決方案1
1 2021-04-22 14:41:34

使用 Python 進行網頁抓取：讓我的 web 抓取代碼更快？

問題描述

1 個解決方案

解決方案1 1 2021-04-22 14:41:34

解決方案1
1 2021-04-22 14:41:34