how to use lxml iterparse from Azure StorageStreamDownloader?

Question

I'm currently using lxml.etree.iterparse to iterate over an XML file tag by tag. Locally this works fine but I want to move the XML file to an Azure Blob Storage and process the file in an Azure function. However, I'm a bit stuck on trying to parse the XML file from the StorageStreamDownloader

Code locally

from lxml import etree

context = etree.iterparse('c:\\Users\\', tag='InstanceElement')

for event, elem in context:
    # processing of the tag

Streaming from Blob

from lxml import etree
from azure.storage.filedatalake import DataLakeServiceClient

connect_str = ''
service = DataLakeServiceClient.from_connection_string(conn_str=connect_string)

System = service.get_file_system_client('')
FileClient = System.get_file_client('')
Stream = FileClient.download_file()

# Stuck on what the input must be for iterparse
context = etree.iterparse(, tag='InstanceElement')

for event, elem in context:
    # processing of the tag

I'm stuck at what the input of iterparse must be, so any ideas on how to parse the XML file while streaming it?

Answer 1

Try this:

from lxml import etree
from azure.storage.filedatalake import DataLakeServiceClient
from io  import BytesIO

connect_str = ''
service = DataLakeServiceClient.from_connection_string(conn_str=connect_str)

System = service.get_file_system_client('')
FileClient = System.get_file_client('test.xml')
content = FileClient.download_file().readall()

context = etree.iterparse(BytesIO(content), tag='InstanceElement')
for event, elem in context:
    print(elem.text)

Content of my test.xml :

Result:

how to use lxml iterparse from Azure StorageStreamDownloader?

Question

1 answers

solution1
0 ACCPTED 2021-04-01 03:21:05

how to use lxml iterparse from Azure StorageStreamDownloader?

Question

1 answers

solution1 0 ACCPTED 2021-04-01 03:21:05

solution1
0 ACCPTED 2021-04-01 03:21:05