简体   繁体   中英

Can i dereference an lxml.etree.AncestorsIterator?

I'm using lxml to manipulate a dbschema expressed in an xml file. It looks something like this:

<Tables>
<Table name = "table1">
<Columns>
<Column name="COL1">...</Column>
<Column name="COL2">...
    <References>
    <Reference>TABLENAME</Reference>
    </References>
</Column>
</Table>
...
</Tables>

Currently I want to look at the references, and get the table and column names for those references. The following works:

refiter = mytree.iter("Reference")
for r in refiter:
   nameiter =r.iterancestors("Table")
   for n in nameiter:
       tablename = .get("name")

I don't like this solution, because I know that my nameiter can only iterate over a single element -- it only has one parent "Table". It seems that in python I can only use an iterator in a loop. But I find it a bit silly. I know I have only one ancestor "Table". Can I dereference the iterator directly somehow? Or is there an alternative method to getting this information that's more suitable?

You can do it using an xpath to get both the ancestors you want

x = """<?xml version="1.0" encoding="utf-8"?>
<Tables>
<Table name = "table1">
<Columns>
<Column name="COL1">...</Column>
<Column name="COL2">...
    <References>
    <Reference>TABLENAME</Reference>
    </References>
</Column>
</Columns>
</Table>
<Table name = "table2">
<Columns>
<Column name="COL2">...</Column>
<Column name="COL3">...
    <References>
    <Reference>TABLENAME</Reference>
    </References>
</Column>
</Columns>
</Table>
</Tables>"""


import lxml.etree  as et

xml = et.fromstring(x)

refs = xml.iter("Reference")
print([(ref.xpath("./ancestor::Table/@name")[0], ref.xpath("./ancestor::Column/@name")[0]) for ref in refs])

Which would give you:

[('table1', 'COL2'), ('table2', 'COL3')]

Or if the Column is always the grandparent:

 [(ref.xpath("./ancestor::Table/@name")[0], ref.xpath("./../../@name")[0]) for ref in refs]

using your own logic, you can just call next on iterancetors:

refs = xml.iter("Reference")


for r in refs:
   print(next(r.iterancestors("Table")).get("name"))
   print(next(r.iterancestors("Column")).get("name"))

Which would give you:

table1
COL2
table2
COL3

As you are interested in only the first result of the iterator, you can use the next method to get the first element, and avoid the unclear/unnecessary for loop.

xml_string = """
<Tables>
<Table name = "table1">
<Columns>
<Column name="COL1">...</Column>
<Column name="COL2">...
    <References>
    <Reference>TABLENAME</Reference>
    </References>
</Column>
</Columns>
</Table>
<Table name = "table2">
<Columns>
<Column name="COL2">...</Column>
<Column name="COL3">...
    <References>
    <Reference>TABLENAME</Reference>
    </References>
</Column>
</Columns>
</Table>
</Tables>"""


import lxml.etree as ETree

root = ETree.fromstring(bytes(xml_string, 'UTF-8'))

refiter = root.iter('Reference')
for r in refiter:
    nameiter = r.iterancestors('Table')
    name = next(nameiter).get('name')
    print(name)

If you wanted to access results by index, you can generate a list from the iterator first.

tables = list(r.iterancestors('Table'))
print(tables[0].get('name'))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM