简体   繁体   English

我无法使用beautifulsoup python获取HTML标签的值

[英]I can't get a value of HTML tag using beautifulsoup python

Hey there is a website that I'm trying to scrape and there are values in the inputs that doesn't scrape as text ONLY HTML Like this嘿,有一个我正在尝试抓取的网站,并且输入中的值不会抓取为纯文本 HTML 像这样

<input class="aspNetDisabled" disabled="disabled" id="ContentPlaceHolder1_EmpName" name="ctl00$ContentPlaceHolder1$EmpName" style="color:#003366;background-color:#CCCCCC;font-weight:bold;height:27px;width:150px;" type="text" value="John Doe"/>

So what I want to do is just getting the Value ( John Doe ) I tried to put.text But it's not scraping it This is the code所以我想要做的只是获取我试图放入的值(John Doe)。文本但它没有抓取它这是代码

soup=BeautifulSoup(r.content,'lxml')
    for name in soup.findAll('input', {'name':'ctl00$ContentPlaceHolder1$EmpName'}):
            with io.open('x.txt', 'w', encoding="utf-8") as f:
                f.write (name.prettify())

The reason you are not getting a result when calling .text is since the "John Doe", is not in the text on the HTML, it's an HTML attribute : value="John Doe" .调用.text时没有得到结果的原因是因为“John Doe”不在 HTML 的文本中,它是一个 HTML属性value="John Doe"

You can access the attribute like a Python dictionary ( dict ) using tag[<attribute>] .您可以使用tag[<attribute>]像 Python 字典 ( dict ) 一样访问属性。 (See the BeautifulSoup documentation on attributes ). (请参阅有关属性BeautifulSoup 文档)。

html = """<input class="aspNetDisabled" disabled="disabled" id="ContentPlaceHolder1_EmpName" name="ctl00$ContentPlaceHolder1$EmpName" style="color:#003366;background-color:#CCCCCC;font-weight:bold;height:27px;width:150px;" type="text" value="John Doe"/>"""

soup = BeautifulSoup(html, "lxml")
for name in soup.findAll("input", {"name": "ctl00$ContentPlaceHolder1$EmpName"}):
    print(name["value"])

Output:输出:

John Doe

While the answer from MendelG works great, it could be a bit cleaner without using a for loop ( if you want to extract only one element ):虽然MendelG的答案效果很好,但不使用for循环可能会更简洁一些(如果您只想提取一个元素):

>>> soup.find('input')['value']
John Doe

Code:代码:

from bs4 import BeautifulSoup

string = '''
<input class="aspNetDisabled" disabled="disabled" id="ContentPlaceHolder1_EmpName" name="ctl00$ContentPlaceHolder1$EmpName" style="color:#003366;background-color:#CCCCCC;font-weight:bold;height:27px;width:150px;" type="text" value="John Doe"/>
'''

soup = BeautifulSoup(string, 'html.parser')

john_come_here = soup.find('input')['value']
print(john_come_here)

>>> John Doe

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 无法使用 BeautifulSoup 获取标签的值 - Can't get value of tag using BeautifulSoup 使用BeautifulSoup无法获得标签的值 - Can't get the value of a tag with BeautifulSoup python:无法使用 BeautifulSoup 从 html 获取特定数据 - python: can't get specific data from html using BeautifulSoup 我无法使用 Beautifulsoup 获得 a 标签,但我可以获得其他标签 - I can't get the a tag using Beautifulsoup, though I can get other tags 当我使用 beautifulsoup 刮 web 时找不到 html 标签 - can't find html tag when I scrape web using beautifulsoup 如何使用beautifulsoup从html标记的特定类中获取数据? - How can I get data from a specific class of a html tag using beautifulsoup? 使用 BeautifulSoup 标记后无法立即获取文本 - Can't get text immediately after </span> tag using BeautifulSoup 无论如何我可以获得img标签的position吗? 在 python 中使用 beautifulsoup 找到 function - Is there anyway I can get the position of img tag? In python using beautifulsoup find function 如何使用 Python 从 BeautifulSoup 中两个 Span 标签之间的 A 标签获取信息? - How Can I Get Information From An A Tag Between Two Span Tags in BeautifulSoup Using Python? 我无法使用 requests.get 和 beautifulsoup 获得我想要的完整 html 内容 - I can't get the full html contents I want using requests.get and beautifulsoup
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM