無法從表中刮取所有 ul 標簽

Question

我正在嘗試從該站點抓取所有代理ip：https://proxy-list.org/english/index.php但我只能在這里獲得一個 Z957B527BFBAD2E80F58D20683931

from helium import *
   from bs4 import BeautifulSoup
   url = 'https://proxy-list.org/english/index.php'
   browser = start_chrome(url, headless=True)
   soup = BeautifulSoup(browser.page_source, 'html.parser')
   proxies = soup.find_all('div', {'class':'table'})
   for ips in proxies:
   print(ips.find('li', {'class':'proxy'}).text)

我嘗試使用 ips.find_all 但它沒有用。

Answer 1

from bs4 import BeautifulSoup
import requests

url = 'https://proxy-list.org/english/index.php'

pagecontent = requests.get(url)
soup = BeautifulSoup(browser.pagecontent, 'html.parser')
maintable = soup.find_all('div', {'class':'table'})
for div_element  in maintable:
    rows = div_element.find_all('li', class_='proxy')
    for ip in rows:
        print(ip.text)

Answer 2

如果我的問題正確，以下是使用 requests 模塊和 Beautifulsoup 庫獲取這些代理的方法之一：

import re
import base64
import requests
from bs4 import BeautifulSoup

url = 'https://proxy-list.org/english/index.php'

def decode_proxy(target_str):
    converted_proxy = base64.b64decode(target_str)
    return converted_proxy.decode()

res = requests.get(url)
soup = BeautifulSoup(res.text, 'html.parser')
for tr in soup.select("#proxy-table li.proxy > script"):
    proxy_id = re.findall(r"Proxy[^']+(.*)\'",tr.contents[0])[0]
    print(decode_proxy(proxy_id))

前幾個結果：

62.80.180.111:8080
68.183.221.156:38159
189.201.134.13:8080
178.60.201.44:8080
128.199.79.15:8080
139.59.78.193:8080
103.148.216.5:80

無法從表中刮取所有 ul 標簽

問題描述

2 個解決方案

解決方案1
1 已采納 2021-04-11 09:56:48

解決方案2
0 2021-04-11 10:53:38

無法從表中刮取所有 ul 標簽

問題描述

2 個解決方案

解決方案1 1 已采納 2021-04-11 09:56:48

解決方案2 0 2021-04-11 10:53:38

解決方案1
1 已采納 2021-04-11 09:56:48

解決方案2
0 2021-04-11 10:53:38