Parsing the contents of 'name' tag in the XML output using BeautifulSoup gives me the following error:
AttributeError: 'unicode' object has no attribute 'get_text'
XML Output:
<show>
<stud>
<__readonly__>
<TABLE_stud>
<ROW_stud>
<name>rice</name>
<dept>chem</dept>
.
.
.
</ROW_stud>
</TABLE_stud>
</__readonly__>
</stud>
</show>
However if I access the contents of other tags like 'dept' it seems to work fine.
stud_info = output_xml.find_all('row_stud')
for eachStud in range(len(stud_info)):
print stud_info[eachStud].dept.get_text() #Gives 'chem'
print stud_info[eachStud].name.get_text() #---Unicode Error---
Can any python/BeautifulSoup experts help me to resolve this? (I know BeautifulSoup is not ideal for parsing XML. But lets just say I'm compelled to use it )
Tag.name
is an attribute containing the tag name; it's value here is row_stud
.
Attribute access to contained tags is a shortcut for .find(attributename)
, but only works if there isn't already an attribute in the API with the same name. Use .find()
instead:
print stud_info[eachStud].find('name').get_text()
You can loop over the stud_info
result list directly , no need to use range()
here:
stud_info = output_xml.find_all('row_stud')
for eachStud in stud_info:
print eachStud.dept.get_text()
print eachStud.find('name').get_text()
I notice that you are searching for row_stud
in lower-case. If you are parsing XML with BeautifulSoup, make sure that you have lxml
installed and tell BeautifulSoup it is XML you are processing, so that it won't HTML-ize your tags (lowercase them):
soup = BeautifulSoup(source, 'xml')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.