簡體   English   中英

無法從表中刮取所有 ul 標簽

[英]Can't scrape all of ul tags from a table

我正在嘗試從該站點抓取所有代理ip:https://proxy-list.org/english/index.php但我只能在這里獲得一個 Z957B527BFBAD2E80F58D20683931

from helium import *
   from bs4 import BeautifulSoup
   url = 'https://proxy-list.org/english/index.php'
   browser = start_chrome(url, headless=True)
   soup = BeautifulSoup(browser.page_source, 'html.parser')
   proxies = soup.find_all('div', {'class':'table'})
   for ips in proxies:
   print(ips.find('li', {'class':'proxy'}).text)

我嘗試使用 ips.find_all 但它沒有用。

from bs4 import BeautifulSoup
import requests

url = 'https://proxy-list.org/english/index.php'

pagecontent = requests.get(url)
soup = BeautifulSoup(browser.pagecontent, 'html.parser')
maintable = soup.find_all('div', {'class':'table'})
for div_element  in maintable:
    rows = div_element.find_all('li', class_='proxy')
    for ip in rows:
        print(ip.text)

如果我的問題正確,以下是使用 requests 模塊和 Beautifulsoup 庫獲取這些代理的方法之一:

import re
import base64
import requests
from bs4 import BeautifulSoup

url = 'https://proxy-list.org/english/index.php'

def decode_proxy(target_str):
    converted_proxy = base64.b64decode(target_str)
    return converted_proxy.decode()

res = requests.get(url)
soup = BeautifulSoup(res.text, 'html.parser')
for tr in soup.select("#proxy-table li.proxy > script"):
    proxy_id = re.findall(r"Proxy[^']+(.*)\'",tr.contents[0])[0]
    print(decode_proxy(proxy_id))

前幾個結果:

62.80.180.111:8080
68.183.221.156:38159
189.201.134.13:8080
178.60.201.44:8080
128.199.79.15:8080
139.59.78.193:8080
103.148.216.5:80

試圖從網站上抓取一張桌子<div tags< div><div id="text_translate"><p> 我正在嘗試刮這張桌子<a href="https://momentranks.com/topshot/account/mariodustice?limit=250" rel="nofollow noreferrer">https://momentranks.com/topshot/account/mariodustice?limit=250</a></p><p> 我試過這個:</p><pre> import requests from bs4 import BeautifulSoup url = 'https://momentranks.com/topshot/account/mariodustice?limit=250' page = requests.get(url) soup = BeautifulSoup(page.content, 'lxml') table = soup.find_all('table', attrs={'class':'Table_tr__1JI4P'})</pre><p> 但它返回一個空列表。 有人可以就如何解決這個問題提供建議嗎?</p></div></div>

[英]Trying to scrape a table from a website with <div tags

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM