简体   繁体   English

使用Beautifulsoup解析HTML并写入CSV-AttributeError或没有html被解析

[英]Parsing HTML and writing to CSV using Beautifulsoup - AttributeError or no html being parsed

I am either receiving an error or nothing is being parsed/written with the following code: 我接收到错误,或者使用以下代码未解析/编写任何内容:

soup = BeautifulSoup(browser.page_source, 'html.parser')
userinfo = soup.find_all("div", attrs={"class": "fieldWrapper"})
rows = userinfo.find_all(attrs="value")

with open('testfile1.csv', 'w') as outfile:
    writer = csv.writer(outfile)
    writer.writerow(rows)

rows = userinfo.find_all(attrs="value") 行= userinfo.find_all(attrs =“ value”)

AttributeError: 'ResultSet' object has no attribute 'find_all' AttributeError:“ ResultSet”对象没有属性“ find_all”

So I tried a for loop with print just to test it, but that returns nothing while the program runs successfully: 因此,我尝试使用print进行for循环只是为了对其进行测试,但是在程序成功运行时它什么也没返回:

userinfo = soup.find_all("div", attrs={"class": "fieldWrapper"})
for row in userinfo:
    rows = row.find_all(attrs="value")
    print(rows)

This is the html I am trying to parse. 这是我要解析的html。 I am trying to return the text from the value attributes: 我试图从值属性返回文本:

<div class="controlHolder">
                        <div id="usernameWrapper" class="fieldWrapper">
                            <span class="styled">Username:</span>
                            <div class="theField">
                                <input name="ctl00$cleanMainPlaceHolder$tbUsername" type="text" value="username" maxlength="16" id="ctl00_cleanMainPlaceHolder_tbUsername" disabled="disabled" tabindex="1" class="textbox longTextBox">
                                <input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnUserName" id="ctl00_cleanMainPlaceHolder_hdnUserName" value="AAubrey"> 
                            </div>
                        </div>
                        <div id="fullNameWrapper" class="fieldWrapper">
                            <span class="styled">Full Name:</span>
                            <div class="theField">
                                <input name="ctl00$cleanMainPlaceHolder$tbFullName" type="text" value="Full Name" maxlength="50" id="ctl00_cleanMainPlaceHolder_tbFullName" tabindex="2" class="textbox longTextBox">
                                <input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnFullName" id="ctl00_cleanMainPlaceHolder_hdnFullName" value="Anthony Aubrey">
                            </div>
                        </div>
                        <div id="emailWrapper" class="fieldWrapper">
                            <span class="styled">Email:</span>
                            <div class="theField">
                                <input name="ctl00$cleanMainPlaceHolder$tbEmail" type="text" value="email@email.com" maxlength="60" id="ctl00_cleanMainPlaceHolder_tbEmail" tabindex="3" class="textbox longTextBox">
                                <input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnEmail" id="ctl00_cleanMainPlaceHolder_hdnEmail" value="aaubrey@bankatunited.com">
                                <span id="ctl00_cleanMainPlaceHolder_validateEmail" style="color:Red;display:none;">Invalid E-Mail</span>
                            </div>
                        </div>
                        <div id="commentWrapper" class="fieldWrapper">
                            <span class="styled">Comment:</span>
                            <div class="theField">
                                <textarea name="ctl00$cleanMainPlaceHolder$tbComment" rows="2" cols="20" id="ctl00_cleanMainPlaceHolder_tbComment" tabindex="4" class="textbox longTextBox"></textarea>
                                <input type="hidden" name="ctl00$cleanMainPlaceHolder$hdnComment" id="ctl00_cleanMainPlaceHolder_hdnComment">
                            </div>
                        </div>

Your first error stems from the fact that find_all returns a ResultSet, which is more or less a list: you would have to iterate through the elements of userinfo and call find_all on those instead. 您的第一个错误是由于find_all返回一个ResultSet(或多或少是一个列表)这一事实:您必须遍历userinfo的元素, find_all对这些元素调用find_all

For your second issue, I'm pretty sure when attrs is passed a string, it searches for elements with that string as its class. 对于您的第二个问题,我非常确定何时将字符串传递给attrs ,它会搜索以该字符串为类的元素。 The html you provided contains no elements with class value , so it makes sense that nothing would get printed out. 您提供的html不包含带有class value元素,因此有意义的是什么也不会打印出来。 You can access an element's value with .get('value') 您可以使用.get('value')访问元素的值

To print out the value of the text inputs, the following code should work. 要打印出文本输入的值,以下代码应该起作用。 (The try/except is just so the script doesn't crash if a text input isn't found) (try / except只是为了使脚本在找不到文本输入时不会崩溃)

for field_wrapper in soup.find_all("div", attrs={"class": "fieldWrapper"}):
    try:
        print(field_wrapper.find("input", attrs={"type": "text"}).get('value'))
    except:
        continue

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM