我无法使用beautifulsoup python获取HTML标签的值

Question

Hey there is a website that I'm trying to scrape and there are values in the inputs that doesn't scrape as text ONLY HTML Like this嘿，有一个我正在尝试抓取的网站，并且输入中的值不会抓取为纯文本 HTML 像这样

<input class="aspNetDisabled" disabled="disabled" id="ContentPlaceHolder1_EmpName" name="ctl00$ContentPlaceHolder1$EmpName" style="color:#003366;background-color:#CCCCCC;font-weight:bold;height:27px;width:150px;" type="text" value="John Doe"/>

So what I want to do is just getting the Value ( John Doe ) I tried to put.text But it's not scraping it This is the code所以我想要做的只是获取我试图放入的值（John Doe）。文本但它没有抓取它这是代码

soup=BeautifulSoup(r.content,'lxml')
    for name in soup.findAll('input', {'name':'ctl00$ContentPlaceHolder1$EmpName'}):
            with io.open('x.txt', 'w', encoding="utf-8") as f:
                f.write (name.prettify())

Answer 1

The reason you are not getting a result when calling .text is since the "John Doe", is not in the text on the HTML, it's an HTML attribute : value="John Doe" .调用.text时没有得到结果的原因是因为“John Doe”不在 HTML 的文本中，它是一个 HTML属性： value="John Doe" 。

You can access the attribute like a Python dictionary ( dict ) using tag[<attribute>] .您可以使用tag[<attribute>]像 Python 字典 ( dict ) 一样访问属性。 (See the BeautifulSoup documentation on attributes ). （请参阅有关属性的BeautifulSoup 文档）。

html = """<input class="aspNetDisabled" disabled="disabled" id="ContentPlaceHolder1_EmpName" name="ctl00$ContentPlaceHolder1$EmpName" style="color:#003366;background-color:#CCCCCC;font-weight:bold;height:27px;width:150px;" type="text" value="John Doe"/>"""

soup = BeautifulSoup(html, "lxml")
for name in soup.findAll("input", {"name": "ctl00$ContentPlaceHolder1$EmpName"}):
    print(name["value"])

Output:输出：

John Doe

Answer 2

While the answer from MendelG works great, it could be a bit cleaner without using a for loop ( if you want to extract only one element ):虽然MendelG的答案效果很好，但不使用for循环可能会更简洁一些（如果您只想提取一个元素）：

>>> soup.find('input')['value']
John Doe

Code:代码：

from bs4 import BeautifulSoup

string = '''
<input class="aspNetDisabled" disabled="disabled" id="ContentPlaceHolder1_EmpName" name="ctl00$ContentPlaceHolder1$EmpName" style="color:#003366;background-color:#CCCCCC;font-weight:bold;height:27px;width:150px;" type="text" value="John Doe"/>
'''

soup = BeautifulSoup(string, 'html.parser')

john_come_here = soup.find('input')['value']
print(john_come_here)

>>> John Doe

我无法使用beautifulsoup python获取HTML标签的值

问题描述

2 个解决方案

解决方案1
0 已采纳 2021-06-30 04:06:46

解决方案2
0 2021-06-30 05:10:47

我无法使用beautifulsoup python获取HTML标签的值

问题描述

2 个解决方案

解决方案1 0 已采纳 2021-06-30 04:06:46

解决方案2 0 2021-06-30 05:10:47

解决方案1
0 已采纳 2021-06-30 04:06:46

解决方案2
0 2021-06-30 05:10:47