简体   繁体   English

在Python中 - 解析响应xml并查找特定的文本vaule

[英]In Python - Parsing a response xml and finding a specific text vaule

I'm new to python and I'm having a particularly difficult time working with xml and python. 我是python的新手,我在使用xml和python时遇到了特别困难。 The situation I have is this, I'm trying to count the number of times a word appears in an xml document. 我的情况是这样,我试图计算一个单词出现在xml文档中的次数。 Simple enough, but the xml document is a response from a server. 很简单,但xml文档是来自服务器的响应。 Is it possible to do this without writing to a file? 是否可以在不写入文件的情况下执行此操作? It would be great trying to do it from memory. 尝试从记忆中做到这一点会很棒。

Here is a sample xml code: 这是一个示例xml代码:

<xml>
  <title>Info</title>
    <foo>aldfj</foo>
      <data>Text I want to count</data>
</xml>

Here is what I have in python 这是我在python中的内容

import urllib2
import StringIO
import xml.dom.minidom
from xml.etree.ElementTree import parse
usock = urllib.urlopen('http://www.example.com/file.xml') 
xmldoc = minidom.parse(usock)
print xmldoc.toxml()

Past This point I have tried using StringIO, ElementTree, and minidom to no success and I have gotten to a point where I'm not sure what else to do. 过去这一点我尝试使用StringIO,ElementTree和minidom没有成功,我已经到了一个点我不知道还能做什么。

Any help would be greatly appreciated 任何帮助将不胜感激

It's quite simple, as far as I can tell: 据我所知,这很简单:

import urllib2
from xml.dom import minidom

usock = urllib2.urlopen('http://www.example.com/file.xml') 
xmldoc = minidom.parse(usock)

for element in xmldoc.getElementsByTagName('data'):
  print element.firstChild.nodeValue

So to count the occurrences of a string, try this (a bit condensed, but I like one-liners): 所以要计算一个字符串的出现次数,试试这个(有点浓缩,但我喜欢单行):

count = sum(element.firstChild.nodeValue.find('substring') for element in xmldoc.getElementsByTagName('data'))

If you are just trying to count the number of times a word appears in an XML document, just read the document as a string and do a count: 如果您只是想计算一个单词出现在XML文档中的次数,只需将该文档作为字符串读取并进行计数:

import urllib2
data = urllib2.urlopen('http://www.example.com/file.xml').read()
print data.count('foobar')

Otherwise, you can just iterate through the tags you are looking for: 否则,您可以遍历您要查找的标记:

from xml.etree import cElementTree as ET
xml = ET.fromstring(urllib2.urlopen('http://www.example.com/file.xml').read())
for data in xml.getiterator('data'):
    # do something with
    data.text

Does this help ... 这有帮助......

from xml.etree.ElementTree import XML

txt = """<xml>
           <title>Info</title>
           <foo>aldfj</foo>
           <data>Text I want to count</data>
         </xml>"""

# this will give us the contents of the data tag.
data = XML(txt).find("data").text

# ... so here we could do whatever we want
print data

Just replace the string 'count' with whatever word you want to count. 只需将字符串'count'替换为您想要计算的任何单词。 If you want to count phrases, then you'll have to adapt this code as this is for word counting. 如果你想计算短语,那么你必须调整这个代码,因为这是用于字数统计。 But anyway, the answer to how to get at all the embedded text is XML('<your xml string here>').itertext() 但无论如何,如何获取所有嵌入文本的答案是XML('<your xml string here>').itertext()

from xml.etree.ElementTree import XML
from re import findall

txt = """<xml>
        <title>Info</title>
        <foo>aldfj</foo>
        <data>Text I want to count</data>
    </xml>"""

sum([len(filter(lambda w: w == 'count', findall('\w+', t))) for t in XML(txt).itertext()])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM