简体   繁体   English

在Python beautifulsoup中提取tr值的href链接

[英]Extracting the href link of a tr value in Python beautifulsoup

The code snippet below is working.下面的代码片段正在运行。 As part of improving it, I wanted to extract the href link and add it to the list of data being displayed.作为改进它的一部分,我想提取 href 链接并将其添加到正在显示的数据列表中。

import requests
from bs4 import BeautifulSoup
from itertools import groupby

url = "https://bscscan.com/tokentxns"

soup = BeautifulSoup(requests.get(url).content, "html.parser")

data = []
for tr in soup.select("tr:has(td)"):
    tds = [td.get_text(strip=True) for td in tr.select("td")]
    _, txn_hash, tm, age, from_, _, to_, value, token = tds
    data.append((token, value, txn_hash))

data = sorted(data)
for _, g in groupby(data, lambda k: k[0]):
    g = list(map(list, g))
    for subl in g[1:]:
        subl[0] = ""

    for subl in g:
        print("{:<27} {:<27} {:<60}".format(*subl))
    print()

Current Output:电流输出:

    Wrapped BNB (WBNB)      0.013344799772136381        0xe45d252ffd82e6720ea1993f95670bb03f130fbe129800d0353e6a54b47f1ab1
                            0.01534839792812691         0x2a6519d13e3bed1b14724a3712a10ee902941c1de9c1fc23f81c438a6954e353
                            0.018368                    0x9cc3beff8b8e70a265fca4f3c95eb5ef3ea01a8731241a22214f0c25e61effa3
                            
    CryptoBlades...(SKILL)  0.971749999999999991        0x885026cd5c6aa9788bc1ef37b4d94f185384d2fcf31a8f44e605bd597b41c9d8
                            0.971749999999999991        0xe0afde7005bde28039ee2ab9c9260df4feed43ec904870c796fa197a12ded1d4
                            0.971749999999999991        0xe94a019d6e473a5485bff9e3e732a9bb9f7e35d4d040cea1866463d771ffbd42

Improvements Needed: # add the href link of the token name (link, token, values, txnhash)需要改进:#添加令牌名称的href链接(链接,令牌,值,txnhash)

    https://bscscan.com/token/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c    Wrapped BNB (WBNB)      0.013344799772136381        0xe45d252ffd82e6720ea1993f95670bb03f130fbe129800d0353e6a54b47f1ab1
                                                                                                    0.01534839792812691         0x2a6519d13e3bed1b14724a3712a10ee902941c1de9c1fc23f81c438a6954e353
                                                                                                    0.018368                    0x9cc3beff8b8e70a265fca4f3c95eb5ef3ea01a8731241a22214f0c25e61effa3
                            
    https://bscscan.com/token/0x154a9f9cbd3449ad22fdae23044319d6ef2a1fab    CryptoBlades...(SKILL)  0.971749999999999991        0x885026cd5c6aa9788bc1ef37b4d94f185384d2fcf31a8f44e605bd597b41c9d8
                                                                                                    0.971749999999999991        0xe0afde7005bde28039ee2ab9c9260df4feed43ec904870c796fa197a12ded1d4
                                                                                                    0.971749999999999991        0xe94a019d6e473a5485bff9e3e732a9bb9f7e35d4d040cea1866463d771ffbd42

Not sure how you are going to sort the print formatting to look nice with that many columns but you can define the base url string as:不确定您将如何对打印格式进行排序以在这么多列中看起来不错,但您可以将基本 url 字符串定义为:

base = 'https://bscscan.com'

Then append the links into data:然后将链接附加到数据中:

for tr in soup.select("tr:has(td)"):
    tds = [td.get_text(strip=True) for td in tr.select("td")]
    _, txn_hash, tm, age, from_, _, to_, value, token = tds
    data.append((txn_hash, token, value,
                 base + tr.select_one('td:nth-child(2) a')['href'], #hash_link
                 base + tr.select_one('td:nth-child(5) a')['href'], #from_link
                 base + tr.select_one('td:nth-child(7) a')['href'], #to_link
                 base + tr.select_one('td:nth-child(9) a')['href'] #token_link
               ))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM