Hey there is a website that I'm trying to scrape and there are values in the inputs that doesn't scrape as text ONLY HTML Like this
<input class="aspNetDisabled" disabled="disabled" id="ContentPlaceHolder1_EmpName" name="ctl00$ContentPlaceHolder1$EmpName" style="color:#003366;background-color:#CCCCCC;font-weight:bold;height:27px;width:150px;" type="text" value="John Doe"/>
So what I want to do is just getting the Value ( John Doe ) I tried to put.text But it's not scraping it This is the code
soup=BeautifulSoup(r.content,'lxml')
for name in soup.findAll('input', {'name':'ctl00$ContentPlaceHolder1$EmpName'}):
with io.open('x.txt', 'w', encoding="utf-8") as f:
f.write (name.prettify())
The reason you are not getting a result when calling .text
is since the "John Doe", is not in the text on the HTML, it's an HTML attribute : value="John Doe"
.
You can access the attribute like a Python dictionary ( dict
) using tag[<attribute>]
. (See the BeautifulSoup documentation on attributes ).
html = """<input class="aspNetDisabled" disabled="disabled" id="ContentPlaceHolder1_EmpName" name="ctl00$ContentPlaceHolder1$EmpName" style="color:#003366;background-color:#CCCCCC;font-weight:bold;height:27px;width:150px;" type="text" value="John Doe"/>"""
soup = BeautifulSoup(html, "lxml")
for name in soup.findAll("input", {"name": "ctl00$ContentPlaceHolder1$EmpName"}):
print(name["value"])
Output:
John Doe
While the answer from MendelG works great, it could be a bit cleaner without using a for
loop ( if you want to extract only one element ):
>>> soup.find('input')['value']
John Doe
Code:
from bs4 import BeautifulSoup
string = '''
<input class="aspNetDisabled" disabled="disabled" id="ContentPlaceHolder1_EmpName" name="ctl00$ContentPlaceHolder1$EmpName" style="color:#003366;background-color:#CCCCCC;font-weight:bold;height:27px;width:150px;" type="text" value="John Doe"/>
'''
soup = BeautifulSoup(string, 'html.parser')
john_come_here = soup.find('input')['value']
print(john_come_here)
>>> John Doe
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.