Python lxml：如何拆分逗号分隔的数据并从 XML 文件中查找特定值？

Question

I have an XML-file containing thousands rows of data.我有一个包含数千行数据的 XML 文件。 One XML-file looks like this:一个 XML 文件如下所示：

<logs xmlns="http://www.xxxxxx.org/xxxxxx/1ser" 
<data> 0.0,1.0,3.0 </data>
<data> 0.5,2.0,4.0 </data>
<data> 1.0,5.0,10.0 </data>
</logs>

I only need to read one specific tag from each file.我只需要从每个文件中读取一个特定的标签。 From the example XML I need only row three and from two values from there (first "column" and the sixth column).在示例 XML 中，我只需要第三行和来自那里的两个值（第一“列”和第六列）。 Values are comma-separated inside of data -tags.值在数据标签内以逗号分隔。 Basically, I need to find and print temperature value based on location, which I already know.基本上，我需要根据我已经知道的位置查找和打印温度值。

I started with lxml.etree and with the code that prints whole data set:我从 lxml.etree 和打印整个数据集的代码开始：

import lxml.etree as ET
file='data.xml'
tree = ET.parse(file)
root = tree.getroot()
for data in root.iter(data):
    print(data.text)

EDIT1编辑1

Once I got an advice to use Xpath and split-method, I have made a piece of code, which looks like this:一旦我得到了使用 Xpath 和 split-method 的建议，我就编写了一段代码，如下所示：

import lxml.etree as ET
file='data.xml'
tree = ET.parse(file)
root = tree.getroot()
ns = {'n': 'http://www.xxxxxx.org/xxxxxx/1ser'}
for data in root.xpath('//n:data[contains(text(), "1.0")]', namespaces=ns):
    print(data.text)

This produces output as 1.0,5.0,10.0这产生的输出为1.0,5.0,10.0

Using this method I can search and get the row number three based on the location (1.0 m).使用这种方法，我可以根据位置 (1.0 m) 搜索并获取第三行。 However, at the moment I'm not able to split the inner text of tag and I don't know how to do that:但是，目前我无法拆分标签的内部文本，我不知道该怎么做：

If I try to split above mentioned output like this如果我尝试像这样拆分上述输出

datat = data.split(",")

I get attribute error:我得到属性错误：

AttributeError: 'lxml.etree._Element' object has no attribute 'split'

And I guess this means that lxml has no split -method and I need to figure out another way to do that.我想这意味着 lxml 没有 split 方法，我需要找出另一种方法来做到这一点。 If I try to split above mentioned output this way:如果我尝试以这种方式拆分上述输出：

datat = [i.split(",") for i in data]
print(datat[0])

My output is just empty brackets meaning that this for loop does most likely nothing.我的输出只是空括号，这意味着这个 for 循环很可能什么都不做。 Printing datat gives me this error, which most likely proves that I haven't done it right.打印 datat 给了我这个错误，这很可能证明我没有做对。

IndexError: list index out of range

My desired output after splitting would be '1.0','5.0','10.0' in order to get my desired output value 10.0 .拆分后我想要的输出是'1.0','5.0','10.0'以获得我想要的输出值10.0 。 After split-method, I guess that vale can be just found adding two more lines:在 split-method 之后，我想可以找到 vale 再添加两行：

T = float(datat[5])
print(T.text)

Does anyone know what is wrong with my splitting methods?有谁知道我的拆分方法有什么问题？ Since I'm not doing it right and haven't yet found any helpful advice via google.因为我做得不对，还没有通过谷歌找到任何有用的建议。

Answer 1

Thanks for the advice related to Xpath and split -method.感谢有关 Xpath 和 split 方法的建议。 Finally, I found a solution to get the value I'm looking for:最后，我找到了一个解决方案来获得我正在寻找的价值：

import lxml.etree as ET
file='data.xml'
tree = ET.parse(file)
root = tree.getroot()
ns = {'n': 'http://www.xxxxxx.org/xxxxxx/1ser'}
for data in root.xpath('//n:data[contains(text(), "1.0")]', namespaces=ns):
        data_string = data.text
        print(data_string)
        split_data = data_string.split(',')
        print(split_data)
        T = float(split_data[2])
        print(T)

Problem seems to be that I didn't create a string.问题似乎是我没有创建字符串。 Basically, I was missing one line in my edit section:基本上，我的编辑部分缺少一行：

data_string = data.text
print(data_string)

with output: 1.0,5.0,10.0 This command splits data:输出： 1.0,5.0,10.0此命令拆分数据：

split_data = data_string.split(',')
print(split_data)

with output: ['\\n1.0', 5.0, 10.0\\n']输出： ['\\n1.0', 5.0, 10.0\\n']

And finally this gives me the output I was looking for:最后这给了我我正在寻找的输出：

T = float(split_data[2])
print(T)

with output: 10.0输出： 10.0

Python lxml：如何拆分逗号分隔的数据并从 XML 文件中查找特定值？

问题描述

1 个解决方案

解决方案1
2 2019-07-10 10:40:47

Python lxml：如何拆分逗号分隔的数据并从 XML 文件中查找特定值？

问题描述

1 个解决方案

解决方案1 2 2019-07-10 10:40:47

解决方案1
2 2019-07-10 10:40:47