I'm using LXML to query multiple XML files containing data elements on various products. This section of code is taking a list of missing product_ids and querying the XML files for the data elements for the products.
One of my core issues is that every product_id obtained through xpath is checked against every item in the list products_missing_from_postgresql , which takes forever (hours)
How do I restart the for entry in entries loop when a match is found?
Maybe this isn't the right question...if not what is the right question?
# this code is for testing purposes
for product_number in products_missing_from_postgresql:
try:
for entry in entries:
product_id = entry.xpath('@id')[0]
if product_id != product_number:
print('************************')
print('current product: ' + product_id)
print('no match: ' + product_number)
print('************************')
else:
print('************************')
print('************************')
print('product to match: ' + product_number)
print('matched from entry: ' + product_id)
print('************************')
print('************************')
Testing Code output:
************************
************************
product to match: B3F2H-STH
matched from entry: B3F2H-STH
************************
************************
************************
current product: B3F2H-STL
no match: B3F2H-STH
************************
************************
current product: B3F2H-004
no match: B3F2H-STH
************************
This code is for production:
for product_number in products_missing_from_postgresql:
try:
for entry in entries:
product_id = entry.xpath('@id')[0]
if product_id != product_number:
# used for testing
print('no match: ' + product_number)
else:
# the element @id has multiple items linked that I need to acquire.
product_id = entry.xpath('@id')[0]
missing_products_to_add.append(product_id)
product_name = entry.xpath('@name')[0]
missing_products_to_add.append(product_name)
product_type = entry.xpath('@type')[0]
missing_products_to_add.append(product_type)
product_price = entry.xpath('@price')[0]
missing_products_to_add.append(product_price)
Try putting your IDs into a set
and compare against that once - this'll save the nested loop and only does the XPaths once instead of keeping re-querying the tree...
ids = {pid for entry in entries for pid in entry.xpath('@id')}
for product_number in products_missing_from_postgresql:
if product_number in ids:
# whatever
else:
# whatever
If you want to also retrieve the elements then you can build a dictionary instead of a set:
products = {p.attrib['id']: p for entry in entries for p in entry.xpath('//*[@id]')}
for product_number in products_missing_from_postgresql:
if product_number in products:
actual_product = products[product_number]
# ...
else:
# ...
Instead of using an inner for
loop, use XPath.
for product_number in products_missing_from_postgresql:
entries = xml_tree.xpath("//entry[@id = '%s']" % product_number)
if entries:
print('FOUND: ' + product_number)
else:
print('NOT FOUND: ' + product_number)
If your product_number
can contain single quotes, the above will break. It's generally preferable to use a placeholder in the XPath and pass the actual value separately:
entries = xml_tree.xpath("//entry[@id = $value]", value=product_number)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.