简体   繁体   中英

Python: how to read a doc file from azure blob storage?

I have a docx file in a blob storage.

What I try to do is to get the link/path or url of the file in the blob to apply this function:

def get_docx_text(path):
    """
    Take the path of a docx file as argument, return the text in unicode.
    """
    document = zipfile.ZipFile(path)
    xml_content = document.read('word/document.xml')
    document.close()
    tree = XML(xml_content)

    paragraphs = []
    for paragraph in tree.getiterator(PARA):
        texts = [node.text
                 for node in paragraph.getiterator(TEXT)
                 if node.text]
        if texts:
            paragraphs.append(''.join(texts))

    text = '\n\n'.join(paragraphs)
    return (paragraphs,text)

In the parameter path of def get_docx_text(path) I would like to put the path of the file.

How can I do this?

I tried something like this but doesn't work:

from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

connection_string='...'
blob_service_client = BlobServiceClient.from_connection_string(connection_string)

service_client = BlobServiceClient.from_connection_string(connection_string)

client = service_client.get_container_client("name_container")

bc = client.get_blob_client(blob="bronze/txt_name.docx")

with open("txt_name.docx", 'wb') as file:

    data = bc.download_blob()

    file.write(data.readall())

Thank You Gaurav for providing your suggestion in the comment, converting it as answer to help other community member.

Issue: ResourceNotFoundError: The specified blob does not exist .

Solution: Please try with this code

bc = client.get_blob_client(blob="sink/bronze/txt_name.docx")

Since you're downloading the blob in the same folder where your code is running, you just have to specify the name with which you're saving the file.

For example:in this code

with open("txt_name.docx", 'wb') as file:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM