[英]How do I get at HTML attributes in nested tags with Mechanize in Python?
[英]How do I add attributes to the tags in dexml?
我正在使用此Python XML序列化庫dexml 。 我不太清楚如何將屬性放在我從對象生成的xml中的某些標簽上。 我通讀了文檔,除非看不懂,否則我找不到很好的解釋。
這是涉及的代碼。
import dexml
import urllib2
from dexml import fields
from bs4 import BeautifulSoup
class Section(dexml.Model):
section = fields.String()
entries = fields.List(fields.String(tagname="Entry"))
# Add something for href here, maybe?
class AtoZ(dexml.Model):
list = fields.List(Section)
def makeSoup(url):
return BeautifulSoup(urllib2.urlopen(url).read())
def main():
soup = makeSoup("http://www.somewebsite.com")
sectionList = []
# You might wonder about the length of this; I *could* split it up
# into variables to make it shorter. Also, the chaining is because
# the 'li' I want are only inside of a <ul class="Nav_fm>".
for li in soup.find('ul', {'class':"Nav_fm"}).find_all('li', {'class':"MenuLevel_0"}):
atzSection = Section()
atzSection.section = li.a.string
for innerLi in li.find_all('li', {'class':"MenuLevel_1"}):
atzSection.entries.append(innerLi.a.string)
# Somehow store innlerLi.a['href'] in atzSection
sectionList.append(atzSection)
atzList = AtoZ(list=sectionList)
f = open("C:\\atoz.xml", "w")
f.write(atzList.render(pretty=True))
f.close()
if __name__ == '__main__':
main()
這是生成的XML。
<?xml version="1.0" ?>
<AtoZ>
<Section section="#">
<Entry>...</Entry>
<Entry>...</Entry>
<Entry>...</Entry>
<Entry>...</Entry>
</Section>
...
<Section section="Z">
<Entry>...</Entry>
<Entry>...</Entry>
<Entry>...</Entry>
<Entry>...</Entry>
</Section>
</AtoZ>
我想每個<Entry>
都有<Entry href="...">...</Entry>
<Entry>
。
嘗試將Section.entries重新定義為這樣的Entry列表:
class Entry(dexml.Model):
href = fields.String()
...
class Section(dexml.Model):
section = fields.String()
entries = fields.List(fields.Model(Entry), tagname='Entry')
查閱dexml 測試代碼 -除了文檔所描述的內容之外,還有很多關於如何使用它的很好的說明。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.