简体   繁体   English

从值标签Etree XML python中提取文本

[英]Extract text from value tag Etree XML python

I want to extract text from a value tag, my xml code fragment and tries are as given below: 我想从value标签中提取文本,我的xml代码片段和尝试如下所示:

<datas>
  <data>
    <column datatype='string' name='[Sub-Category (group)]' role='dimension' type='nominal'>
      <calculation class='categorical-bin' column='[Product Sub-Category]' new-bin='false'>
        <bin value='&quot;Envelopes&quot;'>
          <value>&quot;Envelopes&quot;</value>
          <value>&quot;Labels&quot;</value>
          <value>&quot;Pens &amp; Art Supplies&quot;</value>
          <value>&quot;Rubber Bands&quot;</value>
          <value>&quot;Scissors, Rulers and Trimmers&quot;</value>
        </bin>
      </calculation>
   </column>      
</data>
</datas>

MY try: 我的尝试:

root = 'myxmlfile.xml'
valuelist = []
for i in root.findall('./datas/data/column/calculation/bin')
    val  = i.find('value')
    if val:
       for j in val:
           valuelist.append(j.text)
  • I didn't get proper result. 我没有得到适当的结果。

This might help 这可能有帮助

# -*- coding: utf-8 -*-
s = """<datas>
  <data>
<column datatype='string' name='[Sub-Category (group)]' role='dimension' type='nominal'>
              <calculation class='categorical-bin' column='[Product Sub-Category]' new-bin='false'>
                <bin value='&quot;Envelopes&quot;'>
                  <value>&quot;Envelopes&quot;</value>
                  <value>&quot;Labels&quot;</value>
                  <value>&quot;Pens &amp; Art Supplies&quot;</value>
                  <value>&quot;Rubber Bands&quot;</value>
                  <value>&quot;Scissors, Rulers and Trimmers&quot;</value>
                </bin>
              </calculation>
    </column>
 </data>
</datas>"""

import xml.etree.ElementTree as et
tree = et.fromstring(s)
for i in tree.findall('.//data/column/calculation/bin'):
    for j in i.findall('value'):
        print(j.text)

Output : 输出

"Envelopes"
"Labels"
"Pens & Art Supplies"
"Rubber Bands"
"Scissors, Rulers and Trimmers"

Try this: 尝试这个:

root = open('/your/path_to_file/data.xml', 'rb+')
doc =  ET.parse(root).getroot()
valuelist = []
for i in doc.findall('.//bin'):
    val  = i.findall('value')
    for v in val:
        valuelist.append(v.text)
print valuelist

Output: 输出:

['"Envelopes"', '"Labels"', '"Pens & Art Supplies"', '"Rubber Bands"', '"Scissors, Rulers and Trimmers"']
[Finished in 0.0s]

Rakesh's answer is great, just thought I'd add a bit of explanation of why your code wasn't working. Rakesh的答案很好,只是认为我会对您的代码为何无法正常工作添加一些解释。

To begin with you need to convert your XML into an ElementTree - this is basically just a Python object with a tree-like structure of elements and subelements that corresponds to your XML, but is something you can then work with in Python. 首先,您需要将XML转换为ElementTree-基本上这只是一个Python对象,具有与XML相对应的元素和子元素的树状结构,但是您可以在Python中使用它。

If your XML is in a file (rather than just a string within your code), you can do: 如果您的XML位于文件中(而不仅仅是代码中的字符串),则可以执行以下操作:

tree = ET.parse('myxmlfile.xml')

The root is then the "outermost" element of this tree, which you need to get hold of to be able to work your way around the tree and find elements etc: root是该树的“最外层”元素,您需要掌握它才能在树上工作并查找元素等:

root = tree.getroot()

(If you do ET.fromstring(s) , this returns the root element so you don't need the getroot step.) (如果执行ET.fromstring(s) ,这将返回根元素,因此不需要getroot步骤。)

In your example, root is the datas element, which was one of your problems: your path doesn't need to include 'datas' as that's where you're starting from already. 在您的示例中, rootdatas元素,这是您的问题之一:您的路径不需要包含“ datas”,因为这是您已经开始的地方。

val = i.find('value') will only return the first value element, not a list of all the value elements which is what you want. val = i.find('value')将仅返回第一个value元素,而不是您想要的所有value元素的列表。 So when you try to do for j in val , Python is actually trying to find subelements of the value element (which don't exist) so it doesn't have anything to append to valuelist . 因此,当您尝试for j in val ,Python实际上正在尝试查找value元素的子元素(不存在),因此它没有任何要追加到valuelist You need to use findall() here, and if you combine this with a for loop, then you don't need to do the if val check, as the for loop simply won't run if findall() comes back empty. 您需要在此处使用findall() ,如果将其与for循环结合使用,则无需进行if val检查,因为如果findall()返回空, for循环将不会运行。

Putting all this together: 将所有这些放在一起:

import xml.etree.ElementTree as ET

tree = ET.parse('myxmlfile.xml')  # change to wherever your file is located
root = tree.getroot()

binlist = []
for i in root.findall('./data/column/calculation/bin'):
    valuelist = []
    for j in i.findall('value'):
        valuelist.append(j.text)
    binlist.append(valuelist)

binlist is then a list, with each item in the list being a list of values for that bin. 然后binlist是一个列表,列表中的每个项目都是该bin的值的列表。

If you only have one bin, then you can simplify the second half of the code: 如果只有一个bin,则可以简化代码的后半部分:

import xml.etree.ElementTree as ET

tree = ET.parse('myxmlfile.xml')  # change to wherever your file is located
root = tree.getroot()

bin = root.find('./data/column/calculation/bin')
valuelist = []
for j in bin.findall('value'):
   valuelist.append(j.text)

Note that I've used ET not et for the import of ElementTree (this seems to be the convention). 请注意,我已经使用ET not et来导入ElementTree (这似乎是惯例)。 This also assumes that datas is the first element of your XML. 这也假设datas是XML的第一个元素。 If the snippet you've given is some nested inside a bigger XML file, you'll need to get to that element first by doing something like: 如果给出的代码片段嵌套在一个更大的XML文件中,则需要先执行以下操作来访问该元素:

bin = root.find('<path to bin element>')

These references might be helpful for you: 这些参考可能对您有帮助:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM