简体   繁体   English

Python lxml.etree-从字符串或直接从链接解析XML是否更有效?

[英]Python lxml.etree - Is it more effective to parse XML from string or directly from link?

With the lxml.etree python framework, is it more efficient to parse xml directly from a link to an online xml file or is it better to say, use a different framework (such as urllib2 ), to return a string and then parse from that? 使用lxml.etree python框架,直接从链接到在线xml文件解析xml效率更高,还是更好的说法是使用其他框架(例如urllib2 )返回字符串,然后从该框架解析? Or does it make no difference at all? 还是根本没有区别?

Method 1 - Parse directly from link 方法1-直接从链接解析

from lxml import etree as ET

parsed = ET.parse(url_link)

Method 2 - Parse from string 方法2-从字符串解析

from lxml import etree as ET
import urllib2

xml_string = urllib2.urlopen(url_link).read()
parsed = ET.parse.fromstring(xml_string)

# note: I do not have access to python 
# at the moment, so not sure whether 
# the .fromstring() function is correct

Or is there a more efficient method than either of these, eg save the xml to a .xml file on desktop then parse from those? 还是有比这两种方法更有效的方法,例如将xml保存到桌面上的.xml文件,然后从其中解析?

I ran the two methods with a simple timing rapper. 我用一个简单的计时说唱器运行了这两种方法。

Method 1 - Parse XML Directly From Link 方法1-直接从链接解析XML

from lxml import etree as ET

@timing
def parseXMLFromLink():
    parsed = ET.parse(url_link)
    print parsed.getroot()

for n in range(0,100):
    parseXMLFromLink()

Average of 100 = 98.4035 ms 100的平均值= 98.4035毫秒

Method 2 - Parse XML From String Returned By Urllib2 方法2-从Urllib2返回的字符串中解析XML

from lxml import etree as ET
import urllib2

@timing
def parseXMLFromString():
    xml_string = urllib2.urlopen(url_link).read()
    parsed = ET.fromstring(xml_string)
    print parsed

for n in range(0,100):
    parseXMLFromString()

Average of 100 = 286.9630 ms 100的平均值= 286.9630毫秒

So anecdotally it seems that using lxml to parse directly from the link is the more immediately quick method. 因此,奇怪的是,使用lxml直接从链接中进行解析是更快捷的方法。 It's not clear whether it would be faster to download then parse large xml documents from the hard drive, but presumably unless the document is huge and the parsing task more intensive, the parseXMLFromLink() function would still remain quicker as it is urllib2 that seems to slow the second function down. 目前尚不清楚从硬盘驱动器下载然后解析大型xml文档是否会更快,但是大概除非文档很大且解析任务更加繁琐, parseXMLFromLink()函数仍会保持更快,因为urllib2似乎可以放慢第二个功能。

I ran this a few times and the results stayed the same. 我运行了几次,结果保持不变。

If by 'effective' you mean 'efficient', I'm relatively certain you will see no difference between the two at all (unless ET.parse(link) is horribly implemented). 如果用“有效”来表示“有效”,我可以肯定地说,您将看不到两者之间的任何区别(除非ET.parse(link)得到了可怕的实现)。

The reason is that the network time is going to be the most significant part of parsing an online XML file, a lot longer than storing the file to disk or keeping it in memory, and a lot longer than actually parsing it. 原因是网络时间将成为解析在线XML文件的最重要部分,比将文件存储到磁盘或将其保存在内存中要长得多,并且比实际解析要长得多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM