简体   繁体   中英

python lxml loop on match get next entry

I'm using LXML to query multiple XML files containing data elements on various products. This section of code is taking a list of missing product_ids and querying the XML files for the data elements for the products.

One of my core issues is that every product_id obtained through xpath is checked against every item in the list products_missing_from_postgresql , which takes forever (hours)

How do I restart the for entry in entries loop when a match is found?

Maybe this isn't the right question...if not what is the right question?

# this code is for testing purposes 
for product_number in products_missing_from_postgresql:
try:
    for entry in entries:

       product_id = entry.xpath('@id')[0]

        if product_id != product_number:

            print('************************')
            print('current product: ' + product_id)
            print('no match: ' + product_number)
            print('************************')

        else:

            print('************************')
            print('************************')
            print('product to match: ' + product_number)
            print('matched from entry: ' + product_id)
            print('************************')
            print('************************')

Testing Code output:

************************
************************
product to match: B3F2H-STH 
matched from entry: B3F2H-STH 
************************
************************

************************
current product: B3F2H-STL
no match: B3F2H-STH 
************************

************************
current product: B3F2H-004 
no match: B3F2H-STH 
************************

This code is for production:

for product_number in products_missing_from_postgresql:

try:
for entry in entries:

    product_id = entry.xpath('@id')[0]

    if product_id != product_number:

        # used for testing
        print('no match: ' + product_number)

    else:
       # the element @id has multiple items linked that I need to acquire. 

       product_id = entry.xpath('@id')[0]
       missing_products_to_add.append(product_id)

       product_name = entry.xpath('@name')[0]
       missing_products_to_add.append(product_name)

       product_type = entry.xpath('@type')[0]
       missing_products_to_add.append(product_type)

       product_price = entry.xpath('@price')[0]
       missing_products_to_add.append(product_price)

Try putting your IDs into a set and compare against that once - this'll save the nested loop and only does the XPaths once instead of keeping re-querying the tree...

ids = {pid for entry in entries for pid in entry.xpath('@id')}
for product_number in products_missing_from_postgresql:
    if product_number in ids:
        # whatever
    else:
        # whatever

If you want to also retrieve the elements then you can build a dictionary instead of a set:

products = {p.attrib['id']: p for entry in entries for p in entry.xpath('//*[@id]')}
for product_number in products_missing_from_postgresql:
    if product_number in products:
        actual_product = products[product_number]
        # ...
    else:
        # ...

Instead of using an inner for loop, use XPath.

for product_number in products_missing_from_postgresql:
    entries = xml_tree.xpath("//entry[@id = '%s']" % product_number)
    if entries:
        print('FOUND: ' + product_number)
    else:
        print('NOT FOUND: ' + product_number)

If your product_number can contain single quotes, the above will break. It's generally preferable to use a placeholder in the XPath and pass the actual value separately:

    entries = xml_tree.xpath("//entry[@id = $value]", value=product_number)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM