簡體   English   中英

使用Beautifulsoup解析HTML並寫入CSV-AttributeError或沒有html被解析

[英]Parsing HTML and writing to CSV using Beautifulsoup - AttributeError or no html being parsed

我接收到錯誤,或者使用以下代碼未解析/編寫任何內容:

soup = BeautifulSoup(browser.page_source, 'html.parser')
userinfo = soup.find_all("div", attrs={"class": "fieldWrapper"})
rows = userinfo.find_all(attrs="value")

with open('testfile1.csv', 'w') as outfile:
    writer = csv.writer(outfile)
    writer.writerow(rows)

行= userinfo.find_all(attrs =“ value”)

AttributeError:“ ResultSet”對象沒有屬性“ find_all”

因此,我嘗試使用print進行for循環只是為了對其進行測試,但是在程序成功運行時它什么也沒返回:

userinfo = soup.find_all("div", attrs={"class": "fieldWrapper"})
for row in userinfo:
    rows = row.find_all(attrs="value")
    print(rows)

這是我要解析的html。 我試圖從值屬性返回文本:

<div class="controlHolder">
                        <div id="usernameWrapper" class="fieldWrapper">
                            <span class="styled">Username:</span>
                            <div class="theField">
                                <input name="ctl00$cleanMainPlaceHolder$tbUsername" type="text" value="username" maxlength="16" id="ctl00_cleanMainPlaceHolder_tbUsername" disabled="disabled" tabindex="1" class="textbox longTextBox">
                                <input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnUserName" id="ctl00_cleanMainPlaceHolder_hdnUserName" value="AAubrey"> 
                            </div>
                        </div>
                        <div id="fullNameWrapper" class="fieldWrapper">
                            <span class="styled">Full Name:</span>
                            <div class="theField">
                                <input name="ctl00$cleanMainPlaceHolder$tbFullName" type="text" value="Full Name" maxlength="50" id="ctl00_cleanMainPlaceHolder_tbFullName" tabindex="2" class="textbox longTextBox">
                                <input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnFullName" id="ctl00_cleanMainPlaceHolder_hdnFullName" value="Anthony Aubrey">
                            </div>
                        </div>
                        <div id="emailWrapper" class="fieldWrapper">
                            <span class="styled">Email:</span>
                            <div class="theField">
                                <input name="ctl00$cleanMainPlaceHolder$tbEmail" type="text" value="email@email.com" maxlength="60" id="ctl00_cleanMainPlaceHolder_tbEmail" tabindex="3" class="textbox longTextBox">
                                <input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnEmail" id="ctl00_cleanMainPlaceHolder_hdnEmail" value="aaubrey@bankatunited.com">
                                <span id="ctl00_cleanMainPlaceHolder_validateEmail" style="color:Red;display:none;">Invalid E-Mail</span>
                            </div>
                        </div>
                        <div id="commentWrapper" class="fieldWrapper">
                            <span class="styled">Comment:</span>
                            <div class="theField">
                                <textarea name="ctl00$cleanMainPlaceHolder$tbComment" rows="2" cols="20" id="ctl00_cleanMainPlaceHolder_tbComment" tabindex="4" class="textbox longTextBox"></textarea>
                                <input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnComment" id="ctl00_cleanMainPlaceHolder_hdnComment">
                            </div>
                        </div>

您的第一個錯誤是由於find_all返回一個ResultSet(或多或少是一個列表)這一事實:您必須遍歷userinfo的元素, find_all對這些元素調用find_all

對於您的第二個問題,我非常確定何時將字符串傳遞給attrs ,它會搜索以該字符串為類的元素。 您提供的html不包含帶有class value元素,因此有意義的是什么也不會打印出來。 您可以使用.get('value')訪問元素的值

要打印出文本輸入的值,以下代碼應該起作用。 (try / except只是為了使腳本在找不到文本輸入時不會崩潰)

for field_wrapper in soup.find_all("div", attrs={"class": "fieldWrapper"}):
    try:
        print(field_wrapper.find("input", attrs={"type": "text"}).get('value'))
    except:
        continue

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM