使用XPATH返回多个节点属性

Question

I need to get data from an XML and I'm using XPath, quite new to it, though I'm liking it. 我需要从XML中获取数据，并且我正在使用XPath，尽管我很喜欢它，但它还是很新的东西。

I'm retrieving some nodes based on their attributes like this: 我正在根据其属性来检索某些节点，如下所示：

/cesAlign/linkGrp[@targType='s']

Now I'd like to get the value of another attribute in the node: 现在，我想获取节点中另一个属性的值：

/cesAlign/linkGrp[@targType='s']/@fromDoc

However, this returns the first hit only. 但是，这仅返回第一个匹配。 I'd like to return the attribute of all nodes containing targType ='s' 我想返回所有包含targType ='s'节点的属性

I was thinking of looping over the nodelist and then reading the attribute... something like this: 我当时正在考虑遍历节点列表，然后读取属性……是这样的：

expr = xpath.compile("/cesAlign/linkGrp[@targType='s']/@fromDoc");
    NodeList nl = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);

    int i = 0;
    for (i = 0; i < nl.getLength(); i++) {
        expr = xpath.compile("/@fromDoc");
        System.out.println((String) expr.evaluate(nl, XPathConstants.STRING));
    }

But I'm not sure if there's a better and more elegant way to do this. 但是我不确定是否有更好，更优雅的方法来做到这一点。

Here's a sample XML: 这是一个示例XML：

<cesAlign version="1.0">
 <linkGrp targType="s" toDoc="mt/C2004310.01029701.xml.gz" fromDoc="en/C2004310.01029701.xml.gz">
 <linkGrp targType="s" toDoc="mt/C2004310.01029702.xml.gz" fromDoc="en/C2004310.01029702.xml.gz">
</cesAlign>

Thanks! 谢谢！

Answer 1

I think you will have to iterate over found matches and fetch attribute value for each elements. 我认为您将必须遍历找到的匹配项并获取每个元素的属性值。 Use "//cesAlign/linkGrp[@targType='s' and @fromDoc]" to select elements. 使用"//cesAlign/linkGrp[@targType='s' and @fromDoc]"来选择元素。 Here is an elegant python solution: 这是一个优雅的python解决方案：

#sample XML
xml = """
<cesAlign version="1.0">
 <linkGrp targType="s" toDoc="mt/C2004310.01029701.xml.gz" fromDoc="en/C2004310.01029701.xml.gz"/>
 <linkGrp targType="s" toDoc="mt/C2004310.01029702.xml.gz" fromDoc="en/C2004310.01029702.xml.gz"/>
 <linkGrp targType="s" toDoc="mt/C2004310.01029702.xml.gz" fromDoc="en/C2004310.01029703.xml.gz"/>
 <linkGrp targType="s" toDoc="mt/C2004310.01029702.xml.gz" fromDoc="en/C2004310.01029704.xml.gz"/>
 <linkGrp targType="s" toDoc="mt/C2004310.01029702.xml.gz" notFromDoc = "1"/>
 <linkGrp targType="s" toDoc="mt/C2004310.01029702.xml.gz" notFromDoc = "2"/>
</cesAlign>
"""
from lxml import etree
root = etree.fromstring(xml)
expr = root.xpath("//cesAlign/linkGrp[@targType='s' and @fromDoc]")
print "Matches:", len(expr)
for e in expr:
    print e.attrib["fromDoc"]

The output will be: 输出将是：

Matches: 4
en/C2004310.01029701.xml.gz
en/C2004310.01029702.xml.gz
en/C2004310.01029703.xml.gz
en/C2004310.01029704.xml.gz

Answer 2

Alternatively, you can get each wanted attribute with a separate XPath expression: 另外，您可以使用单独的XPath表达式获取每个所需的属性：

/cesAlign/linkGrp[@targType='s'][$x]/@fromDoc

where $x must be substituted with an integer in the interval: 其中$x必须用区间中的整数代替：

[1, count(/cesAlign/linkGrp[@targType='s'])]

In case you have an XPath 2.0 engine available, the values of all wanted attributes can be obtained with a single XPath 2.0 expression : 如果您有可用的XPath 2.0引擎，则可以使用单个XPath 2.0表达式获取所有所需属性的值 ：

/cesAlign/linkGrp[@targType='s']/@fromDoc/string(.)

when this XPath 2.0 expression is evaluated, the result is a sequence containing the string values of every wanted fromDoc attribute. 评估此XPath 2.0表达式时，结果是一个序列，其中包含每个想要的fromDoc属性的字符串值。

使用XPATH返回多个节点属性

问题描述

2 个解决方案

解决方案1
1 2012-01-15 11:52:33

解决方案2
0 2012-01-15 15:26:45

使用XPATH返回多个节点属性

问题描述

2 个解决方案

解决方案1 1 2012-01-15 11:52:33

解决方案2 0 2012-01-15 15:26:45

解决方案1
1 2012-01-15 11:52:33

解决方案2
0 2012-01-15 15:26:45