简体   繁体   English

如何使用来自 Azure StorageStreamDownloader 的 lxml iterparse?

[英]how to use lxml iterparse from Azure StorageStreamDownloader?

I'm currently using lxml.etree.iterparse to iterate over an XML file tag by tag.我目前正在使用lxml.etree.iterparse逐个标记迭代 XML 文件标记。 Locally this works fine but I want to move the XML file to an Azure Blob Storage and process the file in an Azure function. Locally this works fine but I want to move the XML file to an Azure Blob Storage and process the file in an Azure function. However, I'm a bit stuck on trying to parse the XML file from the StorageStreamDownloader但是,我有点坚持尝试从StorageStreamDownloader解析 XML 文件

Code locally本地编码

from lxml import etree

context = etree.iterparse('c:\\Users\\', tag='InstanceElement')

for event, elem in context:
    # processing of the tag

Streaming from Blob从 Blob 流式传输

from lxml import etree
from azure.storage.filedatalake import DataLakeServiceClient

connect_str = ''
service = DataLakeServiceClient.from_connection_string(conn_str=connect_string)

System = service.get_file_system_client('')
FileClient = System.get_file_client('')
Stream = FileClient.download_file()

# Stuck on what the input must be for iterparse
context = etree.iterparse(, tag='InstanceElement')

for event, elem in context:
    # processing of the tag

I'm stuck at what the input of iterparse must be, so any ideas on how to parse the XML file while streaming it?我坚持iterparse的输入必须是什么,所以关于如何在流式传输时解析 XML 文件的任何想法?

Try this:尝试这个:

from lxml import etree
from azure.storage.filedatalake import DataLakeServiceClient
from io  import BytesIO

connect_str = ''
service = DataLakeServiceClient.from_connection_string(conn_str=connect_str)

System = service.get_file_system_client('')
FileClient = System.get_file_client('test.xml')
content = FileClient.download_file().readall()

context = etree.iterparse(BytesIO(content), tag='InstanceElement')
for event, elem in context:
    print(elem.text)

Content of my test.xml :我的test.xml的内容:

在此处输入图像描述

Result:结果:

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM