简体   繁体   English

Python XML解析挂起?

[英]Python XML Parse Hang?

EDIT: 编辑:

I have a script that parses a sitemap xml and stores the first pass in an array. 我有一个脚本,用于解析站点地图xml并将第一遍存储在数组中。 I then have it set so that it refreshes, parses and stores a desired xml tag into another array to check for any differences. 然后,我对其进行设置,以使其刷新,解析并将所需的xml标签存储到另一个数组中,以检查是否有任何差异。 This second array is constantly updated every 3 seconds on the xmls refresh. 第二个数组在xmls刷新时每3秒不断更新。 However, it seems to get hung up and I am wondering what the problem is. 但是,它似乎挂断了,我想知道问题出在哪里。

import urllib,time
from time import gmtime, strftime
from xml.dom import minidom
url='http://kutoa.com/sitemap_products_1.xml?from=1&to=999999999'
def main():
    primList=[]
    secList=[]
    xml = urllib.urlopen(url).read()
    xmldoc = minidom.parseString(xml)
    loc_values = xmldoc.getElementsByTagName('loc')
    for loc_val in loc_values:
        item=(loc_val.firstChild.nodeValue)
        primList.append(item)
    for i in primList:
        secList.append(item)
    while len(secList)==len(primList):
        print str(strftime("%Y-%m-%d %H:%M:%S", gmtime()))+' :: '+str(len(secList)) +' items indexed...'
        print 'destruct list'
        secList=[]
        print 'empty list/reading url'
        xml = urllib.urlopen(url).read()
        print 'url read/parsing'
        xmldoc = minidom.parseString(xml)
        print 'parsed going for tags'
            loc_values = xmldoc.getElementsByTagName('loc')
        print 'adding tags'
        for loc_val in loc_values:
            item=(loc_val.firstChild.nodeValue)
            secList.append(item)
        print 'tags added to list'
        time.sleep(3)
        print 'sleep for 3\n'
    if len(primList)>len(secList):
            print 'items removed'
            main()
    elif len(secList)>len(primList):
            print 'items added'
            main()
main()

With print statements for troubleshooting I see that it gets hung up on opening the url. 使用用于故障排除的打印语句,我看到它在打开URL时挂了。 Here is some recent output: 这是一些最近的输出:

2015-12-26 18:30:21 :: 7 items indexed...
destruct list
empty list/reading url
url read/parsing
parsed going for tags
adding tags
tags added to list
sleep for 3

2015-12-26 18:30:24 :: 7 items indexed...
destruct list
empty list/reading url
url read/parsing
parsed going for tags
adding tags
tags added to list
sleep for 3

2015-12-26 18:30:27 :: 7 items indexed...
destruct list
empty list/reading url

and then nothing more will output and my program will just hang, un-terminated under the last parse output. 然后什么也不会输出,并且我的程序将挂起,并且在最后的解析输出下未终止。 Is this network related? 这个网络有关系吗? Any thoughts/remedies would be greatly appreciated! 任何想法/补救措施将不胜感激!

At the beginning of your function, before calling urlopen , you might want to set the socket timeout to prevent the call from potentially hanging forever. 在函数开始时,在调用urlopen之前,您可能需要设置套接字超时,以防止调用可能永远挂起。 This snippet sets the timeout to 3 seconds for consistency with your sleep value: 此代码段将超时设置为3秒,以与您的睡眠值保持一致:

import socket

def main():
    socket.setdefaulttimeout(3)
    ...

Then, wrap your call to urlopen to catch the socket.timeout exception. 然后,包装对urlopen的调用以捕获socket.timeout异常。 This snippet just prints a string and continues your loop: 此代码段仅显示一个字符串并继续循环:

try:
    xml = urllib.urlopen(url).read()
except socket.timeout as e:
    print 'timeout reading url: %s' % e
    continue
print 'url read/parsing'
...

I haven't tested this so let me know how it goes for you. 我尚未对此进行测试,所以请告诉我您的情况。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM