如何遍歷嵌套網頁進行網頁抓取？

Question

我想從網頁上抓取數據。

https://www.industrynet.com/companies/

我計划從此站點獲取每個公司的名稱和位置。 我想我需要以某種方式遍歷每個頁面，但是如果在另一個頁面中，我不確定該怎么做。

我只勉強瀏覽單個頁面，因此不勝感激。

Answer 1

您可以將您的抓取過程想象成一棵樹，在該樹上您可以瀏覽頁面的每個分支。 因此，在一些粗略的偽代碼中，它看起來像這樣：

    company_details = {}
    request the landing page and parse
    for letter_href in landing_page:
        scrape the company_code URL and parse
        company_code = some_code_you_scraped
        for company_href in company_code_page:
            scrape the company page URL and parse
            append each company info to the company_details dictionary including the company_code you grabbed from the previous page.

希望這可以幫助！

如何遍歷嵌套網頁進行網頁抓取？

問題描述

1 個解決方案

解決方案1
1 2019-01-29 15:19:18

如何遍歷嵌套網頁進行網頁抓取？

問題描述

1 個解決方案

解決方案1 1 2019-01-29 15:19:18

解決方案1
1 2019-01-29 15:19:18