简体   繁体   中英

How to read docx files from azure blob using Python

How to read docx files from azure blob using Python? I use the following code, but finally, blob_content has all unreadable characters. This code works fine for txt files but not for MS Word Documents (*.docx).

Please help if you have any solution.

blob_service_client_instance = BlobServiceClient(account_url=STORAGEACCOUNTURL, credential=STORAGEACCOUNTKEY)
blob_client_instance = blob_service_client_instance.get_blob_client(container_name, blob_name, snapshot=None)
blob_download = blob_client_instance.download_blob()
blob_content = blob_download.readall().decode('utf-8')

I tried in my environment and got below results:

Initially I tried the piece of code to read the docx file from azure blob storage through visual studio code.

In portal, I have a docx file in azure blob storage

在此处输入图像描述

from  azure.storage.blob  import  BlobServiceClient

client=BlobServiceClient.from_connection_string("<Connection string>")
serviceclient = client.get_container_client("test")
bc = serviceclient.get_blob_client(blob="sample.docx")
   with open("sample.docx", 'wb') as file:
data = bc.download_blob()
file.write(data.readall())

The above code worked and downloaded the docx file from azure blob storage. when I try to open the file it is source code editor not in docx code editor.

Console:

在此处输入图像描述

After I used piece of code to read a docx file from which is downloaded from azure blob Storage.

Code:

import  docx
doc = docx.Document("<path of the downloaded file >")
all_paras = doc.paragraphs
for  para  in  all_paras:
print(para.text)

Console: After I executed the above code, I am able to read the docx file successfully.

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM