![](/img/trans.png)
[英]Parsing a long html using BeautifulSoup failed with half parsed output
[英]Parsing HTML and writing to CSV using Beautifulsoup - AttributeError or no html being parsed
我接收到錯誤,或者使用以下代碼未解析/編寫任何內容:
soup = BeautifulSoup(browser.page_source, 'html.parser')
userinfo = soup.find_all("div", attrs={"class": "fieldWrapper"})
rows = userinfo.find_all(attrs="value")
with open('testfile1.csv', 'w') as outfile:
writer = csv.writer(outfile)
writer.writerow(rows)
行= userinfo.find_all(attrs =“ value”)
AttributeError:“ ResultSet”對象沒有屬性“ find_all”
因此,我嘗試使用print進行for循環只是為了對其進行測試,但是在程序成功運行時它什么也沒返回:
userinfo = soup.find_all("div", attrs={"class": "fieldWrapper"})
for row in userinfo:
rows = row.find_all(attrs="value")
print(rows)
這是我要解析的html。 我試圖從值屬性返回文本:
<div class="controlHolder">
<div id="usernameWrapper" class="fieldWrapper">
<span class="styled">Username:</span>
<div class="theField">
<input name="ctl00$cleanMainPlaceHolder$tbUsername" type="text" value="username" maxlength="16" id="ctl00_cleanMainPlaceHolder_tbUsername" disabled="disabled" tabindex="1" class="textbox longTextBox">
<input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnUserName" id="ctl00_cleanMainPlaceHolder_hdnUserName" value="AAubrey">
</div>
</div>
<div id="fullNameWrapper" class="fieldWrapper">
<span class="styled">Full Name:</span>
<div class="theField">
<input name="ctl00$cleanMainPlaceHolder$tbFullName" type="text" value="Full Name" maxlength="50" id="ctl00_cleanMainPlaceHolder_tbFullName" tabindex="2" class="textbox longTextBox">
<input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnFullName" id="ctl00_cleanMainPlaceHolder_hdnFullName" value="Anthony Aubrey">
</div>
</div>
<div id="emailWrapper" class="fieldWrapper">
<span class="styled">Email:</span>
<div class="theField">
<input name="ctl00$cleanMainPlaceHolder$tbEmail" type="text" value="email@email.com" maxlength="60" id="ctl00_cleanMainPlaceHolder_tbEmail" tabindex="3" class="textbox longTextBox">
<input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnEmail" id="ctl00_cleanMainPlaceHolder_hdnEmail" value="aaubrey@bankatunited.com">
<span id="ctl00_cleanMainPlaceHolder_validateEmail" style="color:Red;display:none;">Invalid E-Mail</span>
</div>
</div>
<div id="commentWrapper" class="fieldWrapper">
<span class="styled">Comment:</span>
<div class="theField">
<textarea name="ctl00$cleanMainPlaceHolder$tbComment" rows="2" cols="20" id="ctl00_cleanMainPlaceHolder_tbComment" tabindex="4" class="textbox longTextBox"></textarea>
<input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnComment" id="ctl00_cleanMainPlaceHolder_hdnComment">
</div>
</div>
您的第一個錯誤是由於find_all
返回一個ResultSet(或多或少是一個列表)這一事實:您必須遍歷userinfo
的元素, find_all
對這些元素調用find_all
。
對於您的第二個問題,我非常確定何時將字符串傳遞給attrs
,它會搜索以該字符串為類的元素。 您提供的html不包含帶有class value
元素,因此有意義的是什么也不會打印出來。 您可以使用.get('value')
訪問元素的值
要打印出文本輸入的值,以下代碼應該起作用。 (try / except只是為了使腳本在找不到文本輸入時不會崩潰)
for field_wrapper in soup.find_all("div", attrs={"class": "fieldWrapper"}):
try:
print(field_wrapper.find("input", attrs={"type": "text"}).get('value'))
except:
continue
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.