使用湯從 web 頁面獲取數據

Question

我想從一個頁面下載數據，其中每個數據的鏈接都位於表格的行中。 附表圖片。 我使用 BeautifulSoup 編寫了一個代碼來讀取所有行的 href，但它無法為我提供下載它們的鏈接列表。 我猜它在每個表格行（tr）中看不到表格數據（td）。

    from bs4 import BeautifulSoup
    import urllib.request
    
    testurl = 'https://www.ercot.com/mp/data-products/data-product-details?id=NP3-562-CD'
    page = urllib.request.urlopen(testurl)
    page_content = BeautifulSoup(page, "html.parser")
    table_dt = page_content.find_all("table")
    for tt in table_dt.select("tr"):
        print(tt)

    ## print
    <tr>
    <th>Friendly Name</th>
    <th colspan="2">Posted</th>
    <th>Available Files</th>
    </tr>##

該表顯示：

    [<table class="table table-condensed report-table" id="reportTable">
     <thead>
     <tr>
     <th>Friendly Name</th>
     <th colspan="2">Posted</th>
     <th>Available Files</th>
     </tr>
     </thead>
     <tbody>
     </tbody>
     </table>]

可以看出，沒有其他行（tr）的信息，它只捕獲 header 行信息。

您能否指導我獲取數據每行的數據鏈接以便下載它們？

Answer 1

最有可能的是，表的結構在原始 HTML 頁面中，並且行數據是通過 Javascript 請求檢索的。 如果您可以弄清楚 javacript 請求是什么（可能通過使用瀏覽器的“Web 開發人員”工具），您就可以通過這種方式獲得它。

使用湯從 web 頁面獲取數據

問題描述

1 個解決方案

解決方案1
1 2022-09-15 00:33:05

使用湯從 web 頁面獲取數據

問題描述

1 個解決方案

解決方案1 1 2022-09-15 00:33:05

解決方案1
1 2022-09-15 00:33:05