[英]Using getElementsByTagName from xml.dom.minidom
I'm going through Asheesh Laroia's "Scrape the Web" presentation from PyCon 2010 and I have a question about a particular line of code which is this line:我正在浏览 PyCon 2010 中 Asheesh Laroia 的“Scrape the Web”演示文稿,我对特定代码行有疑问,即这一行:
title_element = parsed.getElementsByTagName('title')[0]
from the function:来自 function:
def main(filename):
#Parse the file
parsed = xml.dom.minidom.parse(open(filename))
# Get title element
title_element = parsed.getElementsByTagName('title')[0]
# Print just the text underneath it
print title_element.firstChild.wholeText
I don't know what role '[0]' is performing at the end of that line.我不知道该行末尾的“[0]”扮演什么角色。 Does 'xml.dom.minidom.parse' parse the input into a list? “xml.dom.minidom.parse”是否将输入解析为列表?
parse()
does not return a list; parse()
不返回列表; getElementsByTagName()
does. getElementsByTagName()
可以。 You're asking for all elements with a tag of <title>
.您要求所有带有<title>
标签的元素。 Most tags can appear multiple times in a document, so when you ask for those elements, you'll get more than one.大多数标签可以在文档中出现多次,因此当您请求这些元素时,您会得到不止一个。 The obvious way to return them is as a list or tuple.返回它们的明显方法是作为列表或元组。
In this case you expect only one <title>
tag in the document, so you just take the first element in the list.在这种情况下,您希望文档中只有一个<title>
标记,因此您只需获取列表中的第一个元素。
This method's ( getElementsByTagName
) documentation says:此方法的 ( getElementsByTagName
) 文档说:
Search for all descendants (direct children, children's children, etc.) with a particular element type name.搜索具有特定元素类型名称的所有后代(直接子代、子代的子代等)。
Since it mentions " all descendants", then yes, in all likeness it returns a list that this code just indexes to see the first element.既然它提到了“所有后代”,那么是的,很相似,它返回一个列表,该代码只是索引以查看第一个元素。
Looking at the code of this method (in Lib/xml/dom/minidom.py
) - it indeed returns a list.查看此方法的代码(在Lib/xml/dom/minidom.py
中)——它确实返回了一个列表。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.