[英]I can't get a value of HTML tag using beautifulsoup python
Hey there is a website that I'm trying to scrape and there are values in the inputs that doesn't scrape as text ONLY HTML Like this嘿,有一个我正在尝试抓取的网站,并且输入中的值不会抓取为纯文本 HTML 像这样
<input class="aspNetDisabled" disabled="disabled" id="ContentPlaceHolder1_EmpName" name="ctl00$ContentPlaceHolder1$EmpName" style="color:#003366;background-color:#CCCCCC;font-weight:bold;height:27px;width:150px;" type="text" value="John Doe"/>
So what I want to do is just getting the Value ( John Doe ) I tried to put.text But it's not scraping it This is the code所以我想要做的只是获取我试图放入的值(John Doe)。文本但它没有抓取它这是代码
soup=BeautifulSoup(r.content,'lxml')
for name in soup.findAll('input', {'name':'ctl00$ContentPlaceHolder1$EmpName'}):
with io.open('x.txt', 'w', encoding="utf-8") as f:
f.write (name.prettify())
The reason you are not getting a result when calling .text
is since the "John Doe", is not in the text on the HTML, it's an HTML attribute : value="John Doe"
.调用.text
时没有得到结果的原因是因为“John Doe”不在 HTML 的文本中,它是一个 HTML属性: value="John Doe"
。
You can access the attribute like a Python dictionary ( dict
) using tag[<attribute>]
.您可以使用tag[<attribute>]
像 Python 字典 ( dict
) 一样访问属性。 (See the BeautifulSoup documentation on attributes ). (请参阅有关属性的BeautifulSoup 文档)。
html = """<input class="aspNetDisabled" disabled="disabled" id="ContentPlaceHolder1_EmpName" name="ctl00$ContentPlaceHolder1$EmpName" style="color:#003366;background-color:#CCCCCC;font-weight:bold;height:27px;width:150px;" type="text" value="John Doe"/>"""
soup = BeautifulSoup(html, "lxml")
for name in soup.findAll("input", {"name": "ctl00$ContentPlaceHolder1$EmpName"}):
print(name["value"])
Output:输出:
John Doe
While the answer from MendelG works great, it could be a bit cleaner without using a for
loop ( if you want to extract only one element ):虽然MendelG的答案效果很好,但不使用for
循环可能会更简洁一些(如果您只想提取一个元素):
>>> soup.find('input')['value']
John Doe
Code:代码:
from bs4 import BeautifulSoup
string = '''
<input class="aspNetDisabled" disabled="disabled" id="ContentPlaceHolder1_EmpName" name="ctl00$ContentPlaceHolder1$EmpName" style="color:#003366;background-color:#CCCCCC;font-weight:bold;height:27px;width:150px;" type="text" value="John Doe"/>
'''
soup = BeautifulSoup(string, 'html.parser')
john_come_here = soup.find('input')['value']
print(john_come_here)
>>> John Doe
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.