[英]how to use lxml iterparse from Azure StorageStreamDownloader?
I'm currently using lxml.etree.iterparse
to iterate over an XML file tag by tag.我目前正在使用lxml.etree.iterparse
逐个标记迭代 XML 文件标记。 Locally this works fine but I want to move the XML file to an Azure Blob Storage and process the file in an Azure function. Locally this works fine but I want to move the XML file to an Azure Blob Storage and process the file in an Azure function. However, I'm a bit stuck on trying to parse the XML file from the StorageStreamDownloader
但是,我有点坚持尝试从StorageStreamDownloader
解析 XML 文件
Code locally本地编码
from lxml import etree
context = etree.iterparse('c:\\Users\\', tag='InstanceElement')
for event, elem in context:
# processing of the tag
Streaming from Blob从 Blob 流式传输
from lxml import etree
from azure.storage.filedatalake import DataLakeServiceClient
connect_str = ''
service = DataLakeServiceClient.from_connection_string(conn_str=connect_string)
System = service.get_file_system_client('')
FileClient = System.get_file_client('')
Stream = FileClient.download_file()
# Stuck on what the input must be for iterparse
context = etree.iterparse(, tag='InstanceElement')
for event, elem in context:
# processing of the tag
I'm stuck at what the input of iterparse
must be, so any ideas on how to parse the XML file while streaming it?我坚持iterparse
的输入必须是什么,所以关于如何在流式传输时解析 XML 文件的任何想法?
Try this:尝试这个:
from lxml import etree
from azure.storage.filedatalake import DataLakeServiceClient
from io import BytesIO
connect_str = ''
service = DataLakeServiceClient.from_connection_string(conn_str=connect_str)
System = service.get_file_system_client('')
FileClient = System.get_file_client('test.xml')
content = FileClient.download_file().readall()
context = etree.iterparse(BytesIO(content), tag='InstanceElement')
for event, elem in context:
print(elem.text)
Content of my test.xml
:我的test.xml
的内容:
Result:结果:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.