无法从表中刮取所有 ul 标签

Question

I'm trying to scrape all of proxy ips from this site: https://proxy-list.org/english/index.php but i can only get one ip at most here is my code:我正在尝试从该站点抓取所有代理ip：https://proxy-list.org/english/index.php但我只能在这里获得一个 Z957B527BFBAD2E80F58D20683931

from helium import *
   from bs4 import BeautifulSoup
   url = 'https://proxy-list.org/english/index.php'
   browser = start_chrome(url, headless=True)
   soup = BeautifulSoup(browser.page_source, 'html.parser')
   proxies = soup.find_all('div', {'class':'table'})
   for ips in proxies:
   print(ips.find('li', {'class':'proxy'}).text)

i tried to use ips.find_all but it didn't work.我尝试使用 ips.find_all 但它没有用。

Answer 1

from bs4 import BeautifulSoup
import requests

url = 'https://proxy-list.org/english/index.php'

pagecontent = requests.get(url)
soup = BeautifulSoup(browser.pagecontent, 'html.parser')
maintable = soup.find_all('div', {'class':'table'})
for div_element  in maintable:
    rows = div_element.find_all('li', class_='proxy')
    for ip in rows:
        print(ip.text)

Answer 2

If I get your question right, the following is one of the ways how you can fetch those proxies using requests module and Beautifulsoup library:如果我的问题正确，以下是使用 requests 模块和 Beautifulsoup 库获取这些代理的方法之一：

import re
import base64
import requests
from bs4 import BeautifulSoup

url = 'https://proxy-list.org/english/index.php'

def decode_proxy(target_str):
    converted_proxy = base64.b64decode(target_str)
    return converted_proxy.decode()

res = requests.get(url)
soup = BeautifulSoup(res.text, 'html.parser')
for tr in soup.select("#proxy-table li.proxy > script"):
    proxy_id = re.findall(r"Proxy[^']+(.*)\'",tr.contents[0])[0]
    print(decode_proxy(proxy_id))

First few results:前几个结果：

62.80.180.111:8080
68.183.221.156:38159
189.201.134.13:8080
178.60.201.44:8080
128.199.79.15:8080
139.59.78.193:8080
103.148.216.5:80

无法从表中刮取所有 ul 标签

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-04-11 09:56:48

解决方案2
0 2021-04-11 10:53:38

无法从表中刮取所有 ul 标签

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-04-11 09:56:48

解决方案2 0 2021-04-11 10:53:38

解决方案1
1 已采纳 2021-04-11 09:56:48

解决方案2
0 2021-04-11 10:53:38