简体   繁体   中英

Azure Grab Data from Blob Storage w. Python (No downloading)

I'm trying to open a series of different cracked documents / texts that we've stored in Azure Blob storage, ideally pushing them all into a pandas db. I do not want to download them (I'm going to be opening them from a Docker Container), I just want to store the information in memory.

The file structure looks like: Azure Blob Storage -> MyContainer -> UUIDFolderNames (many) -> 1 "knowledge.json" file in each Folder.

What I've got working:

container = ContainerClient.from_connection_string( <my connection str>, <MyContainer> )
blob_list = container.list_blobs()
for blob in blob_list:
    blobClient = container.get_blob_client( blob ) #Not sure this is needed

Ideally for each item in my for loop, I'd do something like opening the .json file, then adding it's text to a row in my dataframe. However, I can't actually manage to open any of the JSON files.

What I've tried:

#1
name = blob.name 
json.loads( name )

#2
with open(name, 'r') as f:
    data = json.load( f )

Errors:

#1 Json Decoder Error Expecting Value: line 1 column 1 (char 0)

#2: No such file or directory

I've tried other sillier things like json.loads( blob ) or json.loads('knowledge.json') (no folder name in path), but those are kinda nonsensicle things that I was just trying to see if they worked, they're not exactly reasonable.

Most methods (including on Azure's documentation) download the file first, but there are examples where the file is just opened directly. That latter is what I'm trying to do.

*Edit: I realized that its somewhat obvious why the file's cannot be found - json.load etc will look in my local directory / where I'm running the python file from, rather than the blob location. Still, not sure how to load a file wo downloading it.

With the help of the below block you will be able to view the JSON blob:

for  blobs  in  container_client.list_blobs():
    blob_client = service_client.get_blob_client(container=Container_name, blob=blobs)
    content = blob_client.download_blob()
    contentastext = content.readall()
    print(contentastext)

Below is the full code to read JSON files from blobs, later you can add this data to your dataframes:

from  azure.storage.blob  import  BlobServiceClient, BlobClient, ContainerClient,PublicAccess
import  os
import  logging
import  sys
import  azure.functions  as  func
from  azure.storage  import  blob
from  azure.storage.blob  import  BlobServiceClient, BlobClient, ContainerClient, __version__

def  UploadFiles():
    CONNECTION_STRING="ENTER_CONNECTION_STR"    
    Container_name="gatherblobs    
    service_client=BlobServiceClient.from_connection_string(CONNECTION_STRING)    
    container_client = service_client.get_container_client(Container_name)

    for  blobs  in  container_client.list_blobs():
        blob_client = service_client.get_blob_client(container=Container_name, blob=blobs)
        content = blob_client.download_blob()
        contentastext = content.readall()
        print(contentastext)

if  __name__ == '__main__':
    UploadFiles()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM