Open an Azure StorageStreamDownloader without saving it as a file

Question

I need to download a PDF from a blob container in azure as a download stream (StorageStreamDownloader) and open it in both PDFPlumber and PDFminer. I developed all the requirements loading them as a file, but I cant manage to received a download stream (StorageStreamDownloader) and open it successfully. I was opening the PDFs like this:

pdf = pdfplumber.open(pdfpath) //for pdfplumber
fp = open('Pdf/' + fileGlob, 'rb')  // for pdfminer
parser = PDFParser(fp) 
document = PDFDocument(parser)

However, i need to be able to download a stream. Code snippet that downloads the pdf as a file:

blob_client = container.get_blob_client(remote_file)
with open(local_file_path,"wb") as local_file:
    download_stream = blob_client.download_blob()
    local_file.write(download_stream.readall())
    local_file.close()

I tried several options, even using a temp file with no luck. Any ideas?

Answer 1

download_blob() download the blob to a StorageStreamDownloader class, and in this class there is a download_to_stream , with this you will get the blob stream.

from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
from io import BytesIO
import PyPDF2
filename = "test.pdf"

container_name="test"

blob_service_client = BlobServiceClient.from_connection_string("connection string")
container_client=blob_service_client.get_container_client(container_name)
blob_client = container_client.get_blob_client(filename)
streamdownloader=blob_client.download_blob()

stream = BytesIO()
streamdownloader.download_to_stream(stream)

fileReader = PyPDF2.PdfFileReader(stream)

print(fileReader.numPages)

And this is my result. It will print the pdf pages number.

Answer 2

It seems download_to_stream() is now deprecated and instead should be used readinto().

from azure.storage.blob import BlobClient


conn_string = ''
container_name = ''
blob_name = ''
blob_obj = BlobClient.from_connection_string(
    conn_str=conn_string, container_name=container_name,
    blob_name=blob_name
)
with open(blob_name, 'wb') as f:
    b = blob_obj.download_blob()
    b.readinto(f)

This will create a file in working directory with the data that was downloaded.

Answer 3

simply add readall() to the download_blob() which will read the data as bytes.

from azure.storage.blob import BlobClient
conn_string = ''
container_name = ''
blob_name = ''
blob_obj = 
BlobClient.from_connection_string(conn_string,container_name,blob_name)
with open(blob_name, 'wb') as f:
    b = blob_obj.download_blob().readall()

Open an Azure StorageStreamDownloader without saving it as a file

Question

3 answers

solution1
6 ACCPTED 2019-12-13 11:29:17

solution2
1 2020-04-27 11:07:11

solution3
0 2021-09-03 06:44:18

Open an Azure StorageStreamDownloader without saving it as a file

Question

3 answers

solution1 6 ACCPTED 2019-12-13 11:29:17

solution2 1 2020-04-27 11:07:11

solution3 0 2021-09-03 06:44:18

solution1
6 ACCPTED 2019-12-13 11:29:17

solution2
1 2020-04-27 11:07:11

solution3
0 2021-09-03 06:44:18