简体   繁体   中英

Saving Matplotlib Output to Blob Storage on Databricks

I'm trying to write matplotlib figures to the Azure blob storage using the method provided here: Saving Matplotlib Output to DBFS on Databricks .

However, when I replace the path in the code with

path = 'wasbs://test@someblob.blob.core.windows.net/'

I get this error

[Errno 2] No such file or directory: 'wasbs://test@someblob.blob.core.windows.net/'

I don't understand the problem...

As per my research, you cannot save Matplotlib output to Azure Blob Storage directly.

You may follow the below steps to save Matplotlib output to Azure Blob Storage:

Step 1: You need to first save it to the Databrick File System (DBFS) and then copy it to Azure Blob storage.

Saving Matplotlib output to Databricks File System (DBFS): We are using the below command to save the output to DBFS: plt.savefig('/dbfs/myfolder/Graph1.png')

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'fruits':['apple','banana'], 'count': [1,2]})
plt.close()
df.set_index('fruits',inplace = True)
df.plot.bar()
plt.savefig('/dbfs/myfolder/Graph1.png')

在此处输入图片说明

Step 2: Copy the file from Databricks File System to Azure Blob Storage.

There are two methods to copy file from DBFS to Azure Blob Stroage.

Method 1: Access Azure Blob storage directly

Access Azure Blob Storage directly by setting "Spark.conf.set" and copy file from DBFS to Blob Storage.

spark.conf.set("fs.azure.account.key.< Blob Storage Name>.blob.core.windows.net", "<Azure Blob Storage Key>")

Use dbutils.fs.cp to copy file from DBFS to Azure Blob Storage:

dbutils.fs.cp('dbfs:/myfolder/Graph1.png', 'wasbs://<Container>@<Storage Name>.blob.core.windows.net/Azure')

在此处输入图片说明

Method 2: Mount Azure Blob storage containers to DBFS

You can mount a Blob storage container or a folder inside a container to Databricks File System (DBFS). The mount is a pointer to a Blob storage container, so the data is never synced locally.

dbutils.fs.mount(
  source = "wasbs://sampledata@chepra.blob.core.windows.net/Azure",
  mount_point = "/mnt/chepra",
  extra_configs = {"fs.azure.sas.sampledata.chepra.blob.core.windows.net":dbutils.secrets.get(scope = "azurestorage", key = "azurestoragekey")})

Use dbutils.fs.cp copy the file to Azure Blob Storage Container:

dbutils.fs.cp('dbfs:/myfolder/Graph1.png', '/dbfs/mnt/chepra')

在此处输入图片说明

By following Method1 or Method2 you can successfully save the output to Azure Blob Storage.

在此处输入图片说明

For more details, refer " Databricks - Azure Blob Storage ".

Hope this helps. Do let us know if you any further queries.

This is what I also came up with so far. In order to reload the image from blob and display it as png in a databricks notebook again I use the following code:

blob_path = ...
dbfs_path = ...
dbutils.fs.cp( blob_path, dbfs_path ) 

with open( dbfs_path, "rb" ) as f:
  im = BytesIO( f.read() )

img = mpimg.imread( im ) 
imgplot = plt.imshow( img )
display( imgplot.figure )

You can write with .savefig() directly to Azure blob storage- you just need to mount the blob container before.

The following works for me, where I had mounted the blob container as /mnt/mydatalakemount

plt.savefig('/dbfs/mnt/mydatalakemount/plt.png')

or

fig.savefig('/dbfs/mnt/mydatalakemount/fig.png')

Documentation on mounting blob container is here .

I didn't succeed using dbutils , which cannot be correctly created. But I did succeed by mounting the file-shares to a Linux path, like this: https://learn.microsoft.com/en-us/azure/azure-functions/scripts/functions-cli-mount-files-storage-linux

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM