简体   繁体   English

Python:如何从 azure blob 存储中读取一个 doc 文件?

[英]Python: how to read a doc file from azure blob storage?

I have a docx file in a blob storage.我在 blob 存储中有一个 docx 文件。

What I try to do is to get the link/path or url of the file in the blob to apply this function:我尝试做的是获取 blob 中文件的链接/路径或 url 以应用此 function:

def get_docx_text(path):
    """
    Take the path of a docx file as argument, return the text in unicode.
    """
    document = zipfile.ZipFile(path)
    xml_content = document.read('word/document.xml')
    document.close()
    tree = XML(xml_content)

    paragraphs = []
    for paragraph in tree.getiterator(PARA):
        texts = [node.text
                 for node in paragraph.getiterator(TEXT)
                 if node.text]
        if texts:
            paragraphs.append(''.join(texts))

    text = '\n\n'.join(paragraphs)
    return (paragraphs,text)

In the parameter path of def get_docx_text(path) I would like to put the path of the file.在 def get_docx_text(path) 的参数路径中,我想放置文件的路径。

How can I do this?我怎样才能做到这一点?

I tried something like this but doesn't work:我试过这样的东西但不起作用:

from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

connection_string='...'
blob_service_client = BlobServiceClient.from_connection_string(connection_string)

service_client = BlobServiceClient.from_connection_string(connection_string)

client = service_client.get_container_client("name_container")

bc = client.get_blob_client(blob="bronze/txt_name.docx")

with open("txt_name.docx", 'wb') as file:

    data = bc.download_blob()

    file.write(data.readall())

Thank You Gaurav for providing your suggestion in the comment, converting it as answer to help other community member.感谢Gaurav在评论中提供您的建议,并将其转化为帮助其他社区成员的答案。

Issue: ResourceNotFoundError: The specified blob does not exist .问题: ResourceNotFoundError: The specified blob does not exist

Solution: Please try with this code解决方案:请尝试使用此代码

bc = client.get_blob_client(blob="sink/bronze/txt_name.docx")

Since you're downloading the blob in the same folder where your code is running, you just have to specify the name with which you're saving the file.由于您是在运行代码的同一文件夹中下载 blob,因此您只需指定用于保存文件的名称。

For example:in this code例如:在这段代码中

with open("txt_name.docx", 'wb') as file:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从 python 中的 azure blob 存储读取文件 - read file from azure blob storage in python 如何使用 Azure 函数从 Blob 存储中读取 json 文件 Python - How to read json file from blob storage using Azure Functions Blob Trigger with Python 如何使用 Python delta-rs 从 Azure Blob 存储中读取数据 - How to read from Azure Blob Storage with Python delta-rs 从 Azure Blob 存储读取 XML 文件 - Read an XML file from Azure Blob Storage 使用 python azure 函数从 azure blob 存储读取文件 - Read files from azure blob storage using python azure functions 如何直接从 Azure blob 存储读取文本文件而不将其下载到本地文件(使用 python)? - How can I read a text file from Azure blob storage directly without downloading it to a local file(using python)? 从 Azure Blob 存储中读取 CSV 文件,而不知道 python 中的 csv 文件名 - Read CSV file from Azure Blob Storage with out knowing the csv file name in python 如何使用 Python 从 Azure Blob 容器读取文件 - How to read a file from Azure Blob Container using Python 如何从Azure Function Python动态读取blob文件 - How to dynamically read blob file from Azure Function Python 如何使用 python 和 pandas read_fwf ZC1C42542074E68384F5D1 处理位于 Azure blob 存储中的文件 - How to process a file located in Azure blob Storage using python with pandas read_fwf function
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM