How to read docx files from azure blob using Python? I use the following code, but finally, blob_content has all unreadable characters. This code works fine for txt files but not for MS Word Documents (*.docx).
Please help if you have any solution.
blob_service_client_instance = BlobServiceClient(account_url=STORAGEACCOUNTURL, credential=STORAGEACCOUNTKEY)
blob_client_instance = blob_service_client_instance.get_blob_client(container_name, blob_name, snapshot=None)
blob_download = blob_client_instance.download_blob()
blob_content = blob_download.readall().decode('utf-8')
I tried in my environment and got below results:
Initially I tried the piece of code to read the docx file from azure blob storage through visual studio code.
In portal, I have a docx file in azure blob storage
from azure.storage.blob import BlobServiceClient
client=BlobServiceClient.from_connection_string("<Connection string>")
serviceclient = client.get_container_client("test")
bc = serviceclient.get_blob_client(blob="sample.docx")
with open("sample.docx", 'wb') as file:
data = bc.download_blob()
file.write(data.readall())
The above code worked and downloaded the docx file from azure blob storage. when I try to open the file it is source code editor not in docx code editor.
Console:
After I used piece of code to read a docx file from which is downloaded from azure blob Storage.
Code:
import docx
doc = docx.Document("<path of the downloaded file >")
all_paras = doc.paragraphs
for para in all_paras:
print(para.text)
Console: After I executed the above code, I am able to read the docx file successfully.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.