[英]Unable to extract data using BeautifulSoup
我正在尝试从此示例中获取服务器列表。
from bs4 import BeautifulSoup as bs
with open('html.txt', 'r') as html:
soup = bs(html, 'html.parser')
div = soup.find('div', class_='grid_8')
for tag in div:
tag = div.find_all('td', class_='StatTDLabel')[2].text
print(tag)
我可以获得列表中的第一台服务器,但我无法遍历它们。 当我尝试使用 for 循环时,我得到了相同的结果。
这是你想要的吗?
from bs4 import BeautifulSoup
from tabulate import tabulate
sample_html = """The contents of your pastebin"""
soup = BeautifulSoup(sample_html, "html.parser").find_all("tr")
servers = [
[i.getText(strip=True) for i in row.find_all("td")] for row in soup[1:]
]
print(tabulate(servers, headers=["Country", "Location", "Address", "Status"]))
Output:
Country Location Address Status
--------- ------------ -------------------- -------------
ZA Johannesburg jnb-c17.ipvanish.com 15 % capacity
ZA Johannesburg jnb-c18.ipvanish.com 15 % capacity
ZA Johannesburg jnb-c19.ipvanish.com 31 % capacity
ZA Johannesburg jnb-c20.ipvanish.com 12 % capacity
ZA Johannesburg jnb-c21.ipvanish.com 9 % capacity
ZA Johannesburg jnb-c22.ipvanish.com 10 % capacity
AL Tirana tia-c02.ipvanish.com 17 % capacity
AL Tirana tia-c03.ipvanish.com 23 % capacity
AL Tirana tia-c04.ipvanish.com 19 % capacity
AL Tirana tia-c05.ipvanish.com 15 % capacity
AE Dubai dxb-c01.ipvanish.com 30 % capacity
AE Dubai dxb-c02.ipvanish.com 26 % capacity
要仅获取服务器地址,请选择索引为2
的第三列。
例如:
servers = [
[i.getText(strip=True) for i in row.find_all("td")][2] for row in soup[1:]
]
print("\n".join(servers))
Output:
jnb-c17.ipvanish.com
jnb-c18.ipvanish.com
jnb-c19.ipvanish.com
jnb-c20.ipvanish.com
jnb-c21.ipvanish.com
jnb-c22.ipvanish.com
tia-c02.ipvanish.com
tia-c03.ipvanish.com
tia-c04.ipvanish.com
tia-c05.ipvanish.com
dxb-c01.ipvanish.com
dxb-c02.ipvanish.com
尝试这个:
from bs4 import BeautifulSoup as bs
with open('html.txt', 'r') as html:
soup = bs(html, 'html.parser')
tags = div.find_all('td', class_='StatTDLabel')
for tag in tags:
tagtext = tag.find(text=True, recursive=False) #take only immediate text of the element and ignore child element texts
if tagtext:
print(tagtext)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.