![](/img/trans.png)
[英]Parsing a long html using BeautifulSoup failed with half parsed output
[英]Parsing HTML and writing to CSV using Beautifulsoup - AttributeError or no html being parsed
我接收到错误,或者使用以下代码未解析/编写任何内容:
soup = BeautifulSoup(browser.page_source, 'html.parser')
userinfo = soup.find_all("div", attrs={"class": "fieldWrapper"})
rows = userinfo.find_all(attrs="value")
with open('testfile1.csv', 'w') as outfile:
writer = csv.writer(outfile)
writer.writerow(rows)
行= userinfo.find_all(attrs =“ value”)
AttributeError:“ ResultSet”对象没有属性“ find_all”
因此,我尝试使用print进行for循环只是为了对其进行测试,但是在程序成功运行时它什么也没返回:
userinfo = soup.find_all("div", attrs={"class": "fieldWrapper"})
for row in userinfo:
rows = row.find_all(attrs="value")
print(rows)
这是我要解析的html。 我试图从值属性返回文本:
<div class="controlHolder">
<div id="usernameWrapper" class="fieldWrapper">
<span class="styled">Username:</span>
<div class="theField">
<input name="ctl00$cleanMainPlaceHolder$tbUsername" type="text" value="username" maxlength="16" id="ctl00_cleanMainPlaceHolder_tbUsername" disabled="disabled" tabindex="1" class="textbox longTextBox">
<input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnUserName" id="ctl00_cleanMainPlaceHolder_hdnUserName" value="AAubrey">
</div>
</div>
<div id="fullNameWrapper" class="fieldWrapper">
<span class="styled">Full Name:</span>
<div class="theField">
<input name="ctl00$cleanMainPlaceHolder$tbFullName" type="text" value="Full Name" maxlength="50" id="ctl00_cleanMainPlaceHolder_tbFullName" tabindex="2" class="textbox longTextBox">
<input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnFullName" id="ctl00_cleanMainPlaceHolder_hdnFullName" value="Anthony Aubrey">
</div>
</div>
<div id="emailWrapper" class="fieldWrapper">
<span class="styled">Email:</span>
<div class="theField">
<input name="ctl00$cleanMainPlaceHolder$tbEmail" type="text" value="email@email.com" maxlength="60" id="ctl00_cleanMainPlaceHolder_tbEmail" tabindex="3" class="textbox longTextBox">
<input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnEmail" id="ctl00_cleanMainPlaceHolder_hdnEmail" value="aaubrey@bankatunited.com">
<span id="ctl00_cleanMainPlaceHolder_validateEmail" style="color:Red;display:none;">Invalid E-Mail</span>
</div>
</div>
<div id="commentWrapper" class="fieldWrapper">
<span class="styled">Comment:</span>
<div class="theField">
<textarea name="ctl00$cleanMainPlaceHolder$tbComment" rows="2" cols="20" id="ctl00_cleanMainPlaceHolder_tbComment" tabindex="4" class="textbox longTextBox"></textarea>
<input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnComment" id="ctl00_cleanMainPlaceHolder_hdnComment">
</div>
</div>
您的第一个错误是由于find_all
返回一个ResultSet(或多或少是一个列表)这一事实:您必须遍历userinfo
的元素, find_all
对这些元素调用find_all
。
对于您的第二个问题,我非常确定何时将字符串传递给attrs
,它会搜索以该字符串为类的元素。 您提供的html不包含带有class value
元素,因此有意义的是什么也不会打印出来。 您可以使用.get('value')
访问元素的值
要打印出文本输入的值,以下代码应该起作用。 (try / except只是为了使脚本在找不到文本输入时不会崩溃)
for field_wrapper in soup.find_all("div", attrs={"class": "fieldWrapper"}):
try:
print(field_wrapper.find("input", attrs={"type": "text"}).get('value'))
except:
continue
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.