Reading an XML file from URL in Python

Question

I would like to read the integers present inside the count tags .

This is the code I have written:

import xml.etree.ElementTree as ET
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl

ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url =  'http://py4e-data.dr-chuck.net/comments_42.xml'
content1 = urllib.request.urlopen(url, context = ctx).read()
soup = BeautifulSoup(content1, 'html.parser')

tree = ET.fromstring(soup)
tags = tree.findall('count')
print(tags)

It throws an error:

Traceback (most recent call last):
  File "C:\Users\Name\Desktop\Py4e\Me\Assi_15_01.py", line 15, in <module>
    tree = ET.fromstring(soup)

  File "C:\Users\Name\AppData\Local\Programs\Python\Python38-32\lib\xml\etree\ElementTree.py", line 1320, in XML
    parser.feed(text)
TypeError: a bytes-like object is required, not 'BeautifulSoup'

What can I do?

More information: http://py4e-data.dr-chuck.net/comments_42.xml

Answer 1

There's no need to use xml.etree , just select all <count> tags with BeautifulSoup:

import requests
from bs4 import BeautifulSoup


url =  'http://py4e-data.dr-chuck.net/comments_42.xml'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for c in soup.select('count'):
    print(int(c.text))

Prints:

Answer 2

I don't think you need to use ElementTreee. Just change BeautiflulSoup to use the lxml parser (change 'html-parser' to 'lxml') and call the findall method on soup, not tree (ie soup.findall('count')).

Reading an XML file from URL in Python

Question

2 answers

solution1
1 ACCPTED 2020-07-07 22:02:52

solution2
-1 2020-07-07 22:04:14

Reading an XML file from URL in Python

Question

2 answers

solution1 1 ACCPTED 2020-07-07 22:02:52

solution2 -1 2020-07-07 22:04:14

solution1
1 ACCPTED 2020-07-07 22:02:52

solution2
-1 2020-07-07 22:04:14