简体   繁体   English

美丽的汤不返回 HTML

[英]Beautiful Soup not returning HTML

I use the below script to gather all tags from a html page, but it's not showing html response, instead I am getting something else我使用以下脚本从 html 页面收集所有标签,但它没有显示 html 响应,而是我得到了其他东西

import urllib.request
from bs4 import BeautifulSoup
loginurl= 'https://172.56.66.77'
fhand = urllib.request.urlopen(loginurl).read()
soup = BeautifulSoup(fhand,'html.parser')
print(soup)

I tried collect a particular data from html page, but when I use Beautiful soup, it's not getting html data instead I am getting the below response我尝试从 html 页面收集特定数据,但是当我使用美丽的汤时,它没有得到 html 数据,而是得到以下响应

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="xslt.cgi"?>
<iconmenu>
<title>Geräteinformationen</title><prompt>Geräteinformationen anzhhas</prompt>
<menuitem/><iconindex>-1</iconindex><name>MAC-Adresse :  76238823354</name><url></url>
<menuitem/><iconindex>-1</iconindex><name>Host-Name : SEP76238823354</name><url></url>
</iconmenu>

I cannot filter the data as it's not showing html tag.我无法过滤数据,因为它没有显示 html 标签。

Please help me to get the 2nd data SEP76238823354 from the response请帮助我从响应中获取第二个数据SEP76238823354

It turns out that you just need to remove the second argument 'html.parser' from the constructor call:事实证明,您只需要从构造函数调用中删除第二个参数'html.parser'

import urllib.request
from bs4 import BeautifulSoup
xml_doc = """<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="xslt.cgi"?>
<iconmenu>
<title>Geräteinformationen</title><prompt>Geräteinformationen anzhhas</prompt>
<menuitem/><iconindex>-1</iconindex><name>MAC-Adresse :  76238823354</name><url></url>
<menuitem/><iconindex>-1</iconindex><name>Host-Name : SEP76238823354</name><url></url>
</iconmenu>"""
soup = BeautifulSoup(xml_doc)
print(soup.find_all("name")[1])
# -> <name>Host-Name : SEP76238823354</name>

Just select the element you need in this case, by containing Host-Name, split() it by delemiter and grab the last part:只需 select 在这种情况下您需要的元素,通过包含主机名,通过分隔符split()它并抓住最后一部分:

...
soup = BeautifulSoup(fhand, 'xml')
soup.select_one('name:-soup-contains("Host-Name")').text.split(': ')[-1]

Output: Output:

SEP76238823354

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM