python beautifulsoup can't prettify

Question

I seem to be doing something wrong. I have an HTML source that I pull using urllib. Based on this HTML file I use beautifulsoup to findAll elements with an ID based on a specified array. This works for me, however the output is messy and includes linebreaks "\\n".

Python: 2.7.12
BeautifulSoup: bs4

I have tried to use prettify() to correct the output but always get an error:

AttributeError: 'ResultSet' object has no attribute 'prettify'

import urllib
import re
from bs4 import BeautifulSoup

cfile = open("test.txt")
clist = cfile.read()
clist = clist.split('\n')

i=0

while i<len (clist):
    url = "https://example.com/"+clist[i]
    htmlfile = urllib.urlopen (url)
    htmltext = htmlfile.read()

    soup = BeautifulSoup (htmltext, "html.parser")
    soup = soup.findAll (id=["id1", "id2", "id3"])

print soup.prettify()
i+=1

I'm sure there is something simple I am overlooking with this line:

soup = soup.findAll (id=["id1", "id2", "id3"])

I'm just not sure what. Sorry if this is a stupid question. I've only been using Python and Beautiful Soup for a few days.

Answer 1

You are reassigning the soup variable to the result of .findAll() , which is a ResultSet object (basically, a list of tags) which does not have the prettify() method.

The solution is to keep the soup variable pointing to the BeautifulSoup instance.

Answer 2

You can call prettify() on the top-level BeautifulSoup object, or on any of its Tag objects:

findAll return a list of match tags, so your code equal to [tag1,tag2..].prettify() and it will not work.

python beautifulsoup can't prettify

Question

2 answers

solution1
3 2016-11-16 22:08:17

solution2
0 2016-11-17 00:36:03

python beautifulsoup can't prettify

Question

2 answers

solution1 3 2016-11-16 22:08:17

solution2 0 2016-11-17 00:36:03

solution1
3 2016-11-16 22:08:17

solution2
0 2016-11-17 00:36:03