Python 收集鏈接后面的 URL

Question

我有幾個網站，每個網站都有鏈接。 在這些鏈接后面，當我通過鏈接進行 hover 時，我可以在狀態欄中看到一些 URL。 我需要使用 Python 獲取這些鏈接。 當我查看頁面源時，“href”沒有顯示這些鏈接，這表明它們是使用 Javascript 顯示的。

有沒有辦法可以使用 Python 實際收集這些 URL？ 謝謝。

Answer 1

使用瀏覽器的開發人員工具，您可以檢查按鈕元素並查看它們是否綁定到onClick執行 function getCompYData 。 這個 function 定義為：

function getCompYData(t, a, b) {
  $("#yearlySmbData").empty(), $("#mheader").html(b), $.post("annQtrStmts.php", {
    name: "get_comp_y_data",
    smbCode: t,
    year: a
  }, function(t) {
    obj = JSON.parse(t), $("#yearlySmbData").createTable(obj, {})
  })
}

通過使用name字符串、 smbCode （例如 AABS）和年份（例如 2020）對annQtrStmts.php執行 HTTP POST 請求，您應該能夠訪問相應的文件。

請記住，這樣做可能違反本網站的條款和條件。

編輯：根據更新的問題，您實際上想查看此 function：

function getCompData() {
  var t = $("#country").val();
  $(".nav-link").removeClass("active"), $("#yearlyData").empty(), $("#annRpt").html("Financial Reports <br><br>" + $("#country option:selected").text() + " ( " + t + " )"), $.post("annQtrStmts.php", {
    name: "get_comp_data",
    smbCode: t
  }, function(t) {
    obj = JSON.parse(t), $("#yearlyData").createTable(obj, {})
  })
}

端點是相同的，但在這種情況下，您傳遞的是不同的字符串並且沒有年份。

Answer 2

import requests
from bs4 import BeautifulSoup

def getMyUrl(*arg):
#     print(arg)
    for _ in arg:
        if requests.head(_).status_code == 200:
            soup = BeautifulSoup(requests.get(_).text, "html.parser")
            for a_tag in soup.findAll("a"):
                print(a_tag.attrs.get("href"))

#Use this like

if __name__ == "__main__":
    getMyUrl("https://www.google.com", "https://example.com")

Python 收集鏈接后面的 URL

問題描述

2 個解決方案

解決方案1
1 已采納 2020-06-28 17:59:12

解決方案2
0 2020-06-28 18:05:01

Python 收集鏈接后面的 URL

問題描述

2 個解決方案

解決方案1 1 已采納 2020-06-28 17:59:12

解決方案2 0 2020-06-28 18:05:01

解決方案1
1 已采納 2020-06-28 17:59:12

解決方案2
0 2020-06-28 18:05:01