I seem to be doing something wrong. I have an HTML source that I pull using urllib. Based on this HTML file I use beautifulsoup to findAll elements with an ID based on a specified array. This works for me, however the output is messy and includes linebreaks "\\n".
I have tried to use prettify() to correct the output but always get an error:
AttributeError: 'ResultSet' object has no attribute 'prettify'
import urllib
import re
from bs4 import BeautifulSoup
cfile = open("test.txt")
clist = cfile.read()
clist = clist.split('\n')
i=0
while i<len (clist):
url = "https://example.com/"+clist[i]
htmlfile = urllib.urlopen (url)
htmltext = htmlfile.read()
soup = BeautifulSoup (htmltext, "html.parser")
soup = soup.findAll (id=["id1", "id2", "id3"])
print soup.prettify()
i+=1
I'm sure there is something simple I am overlooking with this line:
soup = soup.findAll (id=["id1", "id2", "id3"])
I'm just not sure what. Sorry if this is a stupid question. I've only been using Python and Beautiful Soup for a few days.
You are reassigning the soup
variable to the result of .findAll()
, which is a ResultSet
object (basically, a list of tags) which does not have the prettify()
method.
The solution is to keep the soup
variable pointing to the BeautifulSoup
instance.
You can call prettify() on the top-level BeautifulSoup object, or on any of its Tag objects:
findAll return a list of match tags, so your code equal to [tag1,tag2..].prettify()
and it will not work.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.