简体   繁体   English

使用XPATH返回多个节点属性

[英]Returning multiple node attributes using XPATH

I need to get data from an XML and I'm using XPath, quite new to it, though I'm liking it. 我需要从XML中获取数据,并且我正在使用XPath,尽管我很喜欢它,但它还是很新的东西。

I'm retrieving some nodes based on their attributes like this: 我正在根据其属性来检索某些节点,如下所示:

/cesAlign/linkGrp[@targType='s']

Now I'd like to get the value of another attribute in the node: 现在,我想获取节点中另一个属性的值:

/cesAlign/linkGrp[@targType='s']/@fromDoc

However, this returns the first hit only. 但是,这仅返回第一个匹配。 I'd like to return the attribute of all nodes containing targType ='s' 我想返回所有包含targType ='s'节点的属性

I was thinking of looping over the nodelist and then reading the attribute... something like this: 我当时正在考虑遍历节点列表,然后读取属性……是这样的:

expr = xpath.compile("/cesAlign/linkGrp[@targType='s']/@fromDoc");
    NodeList nl = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);

    int i = 0;
    for (i = 0; i < nl.getLength(); i++) {
        expr = xpath.compile("/@fromDoc");
        System.out.println((String) expr.evaluate(nl, XPathConstants.STRING));
    }

But I'm not sure if there's a better and more elegant way to do this. 但是我不确定是否有更好,更优雅的方法来做到这一点。

Here's a sample XML: 这是一个示例XML:

<cesAlign version="1.0">
 <linkGrp targType="s" toDoc="mt/C2004310.01029701.xml.gz" fromDoc="en/C2004310.01029701.xml.gz">
 <linkGrp targType="s" toDoc="mt/C2004310.01029702.xml.gz" fromDoc="en/C2004310.01029702.xml.gz">
</cesAlign>

Thanks! 谢谢!

I think you will have to iterate over found matches and fetch attribute value for each elements. 我认为您将必须遍历找到的匹配项并获取每个元素的属性值。 Use "//cesAlign/linkGrp[@targType='s' and @fromDoc]" to select elements. 使用"//cesAlign/linkGrp[@targType='s' and @fromDoc]"来选择元素。 Here is an elegant python solution: 这是一个优雅的python解决方案:

#sample XML
xml = """
<cesAlign version="1.0">
 <linkGrp targType="s" toDoc="mt/C2004310.01029701.xml.gz" fromDoc="en/C2004310.01029701.xml.gz"/>
 <linkGrp targType="s" toDoc="mt/C2004310.01029702.xml.gz" fromDoc="en/C2004310.01029702.xml.gz"/>
 <linkGrp targType="s" toDoc="mt/C2004310.01029702.xml.gz" fromDoc="en/C2004310.01029703.xml.gz"/>
 <linkGrp targType="s" toDoc="mt/C2004310.01029702.xml.gz" fromDoc="en/C2004310.01029704.xml.gz"/>
 <linkGrp targType="s" toDoc="mt/C2004310.01029702.xml.gz" notFromDoc = "1"/>
 <linkGrp targType="s" toDoc="mt/C2004310.01029702.xml.gz" notFromDoc = "2"/>
</cesAlign>
"""
from lxml import etree
root = etree.fromstring(xml)
expr = root.xpath("//cesAlign/linkGrp[@targType='s' and @fromDoc]")
print "Matches:", len(expr)
for e in expr:
    print e.attrib["fromDoc"]

The output will be: 输出将是:

Matches: 4
en/C2004310.01029701.xml.gz
en/C2004310.01029702.xml.gz
en/C2004310.01029703.xml.gz
en/C2004310.01029704.xml.gz

Alternatively, you can get each wanted attribute with a separate XPath expression: 另外,您可以使用单独的XPath表达式获取每个所需的属性:

/cesAlign/linkGrp[@targType='s'][$x]/@fromDoc 

where $x must be substituted with an integer in the interval: 其中$x必须用区间中的整数代替:

[1, count(/cesAlign/linkGrp[@targType='s'])]

In case you have an XPath 2.0 engine available, the values of all wanted attributes can be obtained with a single XPath 2.0 expression : 如果您有可用的XPath 2.0引擎,则可以使用单个XPath 2.0表达式获取所有所需属性的值

/cesAlign/linkGrp[@targType='s']/@fromDoc/string(.)

when this XPath 2.0 expression is evaluated, the result is a sequence containing the string values of every wanted fromDoc attribute. 评估此XPath 2.0表达式时,结果是一个序列,其中包含每个想要的fromDoc属性的字符串值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM