简体   繁体   English

无法从表中刮取所有 ul 标签

[英]Can't scrape all of ul tags from a table

I'm trying to scrape all of proxy ips from this site: https://proxy-list.org/english/index.php but i can only get one ip at most here is my code:我正在尝试从该站点抓取所有代理ip:https://proxy-list.org/english/index.php但我只能在这里获得一个 Z957B527BFBAD2E80F58D20683931

from helium import *
   from bs4 import BeautifulSoup
   url = 'https://proxy-list.org/english/index.php'
   browser = start_chrome(url, headless=True)
   soup = BeautifulSoup(browser.page_source, 'html.parser')
   proxies = soup.find_all('div', {'class':'table'})
   for ips in proxies:
   print(ips.find('li', {'class':'proxy'}).text)

i tried to use ips.find_all but it didn't work.我尝试使用 ips.find_all 但它没有用。

from bs4 import BeautifulSoup
import requests

url = 'https://proxy-list.org/english/index.php'

pagecontent = requests.get(url)
soup = BeautifulSoup(browser.pagecontent, 'html.parser')
maintable = soup.find_all('div', {'class':'table'})
for div_element  in maintable:
    rows = div_element.find_all('li', class_='proxy')
    for ip in rows:
        print(ip.text)

If I get your question right, the following is one of the ways how you can fetch those proxies using requests module and Beautifulsoup library:如果我的问题正确,以下是使用 requests 模块和 Beautifulsoup 库获取这些代理的方法之一:

import re
import base64
import requests
from bs4 import BeautifulSoup

url = 'https://proxy-list.org/english/index.php'

def decode_proxy(target_str):
    converted_proxy = base64.b64decode(target_str)
    return converted_proxy.decode()

res = requests.get(url)
soup = BeautifulSoup(res.text, 'html.parser')
for tr in soup.select("#proxy-table li.proxy > script"):
    proxy_id = re.findall(r"Proxy[^']+(.*)\'",tr.contents[0])[0]
    print(decode_proxy(proxy_id))

First few results:前几个结果:

62.80.180.111:8080
68.183.221.156:38159
189.201.134.13:8080
178.60.201.44:8080
128.199.79.15:8080
139.59.78.193:8080
103.148.216.5:80

试图从网站上抓取一张桌子<div tags< div><div id="text_translate"><p> 我正在尝试刮这张桌子<a href="https://momentranks.com/topshot/account/mariodustice?limit=250" rel="nofollow noreferrer">https://momentranks.com/topshot/account/mariodustice?limit=250</a></p><p> 我试过这个:</p><pre> import requests from bs4 import BeautifulSoup url = 'https://momentranks.com/topshot/account/mariodustice?limit=250' page = requests.get(url) soup = BeautifulSoup(page.content, 'lxml') table = soup.find_all('table', attrs={'class':'Table_tr__1JI4P'})</pre><p> 但它返回一个空列表。 有人可以就如何解决这个问题提供建议吗?</p></div></div> - Trying to scrape a table from a website with <div tags

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 不能刮掉 python webscrape 中所有 UL 标签的文本 - Can't Scrape All the UL Tag's text in python webscrape 无法从Airbnb抓取所有HTML - Can't scrape all HTML from Airbnb 我不能 web 使用 python 抓取表格的 tr 标签 - I can't web scrape scrape tr tags of a table using python 无法从网站上抓取一些“ div”标签 - Can't scrape some “div” tags from a site 试图从网站上抓取一张桌子<div tags< div><div id="text_translate"><p> 我正在尝试刮这张桌子<a href="https://momentranks.com/topshot/account/mariodustice?limit=250" rel="nofollow noreferrer">https://momentranks.com/topshot/account/mariodustice?limit=250</a></p><p> 我试过这个:</p><pre> import requests from bs4 import BeautifulSoup url = 'https://momentranks.com/topshot/account/mariodustice?limit=250' page = requests.get(url) soup = BeautifulSoup(page.content, 'lxml') table = soup.find_all('table', attrs={'class':'Table_tr__1JI4P'})</pre><p> 但它返回一个空列表。 有人可以就如何解决这个问题提供建议吗?</p></div></div> - Trying to scrape a table from a website with <div tags 无法使用BeautifulSoup抓取嵌套标签 - Can't scrape nested tags using BeautifulSoup 如何从特定表中抓取所有行? - How can I scrape all rows from a specific table? 无法刮表 BeautifulSoup - Can't scrape table BeautifulSoup 无法从网页中抓取所有链接及其子链接 - Can't scrape all the links and their sublinks from a webpage 无法从网页上抓取所有公司名称 - Can't scrape all the company names from a webpage
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM