How to read parquet file into pandas from Azure blob store

Question

I need to read and write parquet files from an Azure blob store within the context of a Jupyter notebook running Python 3 kernel.

I see code for working strictly with parquet files and python and other code for grabbing/writing to an Azure blob store but nothing yet that put's it all together.

Here is some sample code I'm playing with:

from azure.storage.blob import BlockBlobService block_blob_service = BlockBlobService(account_name='testdata', account_key='key-here') block_blob_service.get_blob_to_text(container_name='mycontainer', blob_name='testdata.parquet')

This last line with throw an encoding-related error. I've played with storefact but coming up short there.

Thanks for any help

Answer 1

To access the file you need to access the azure blob storage first.

storage_account_name = "your storage account name"
storage_account_access_key = "your storage account access key"

Read path of parquet file into variable

commonLOB_mst_source = "Parquet file path"
file_type = "parquet"

Connect to blob storage

spark.conf.set(
  "fs.azure.account.key."+storage_account_name+".blob.core.windows.net",
  storage_account_access_key)

Read Parquet file into dataframe

df = spark.read.format(file_type).option("inferSchema", "true").load(commonLOB_mst_source)

How to read parquet file into pandas from Azure blob store

Question

1 answers

solution1
0 2022-08-29 09:54:32

How to read parquet file into pandas from Azure blob store

Question

1 answers

solution1 0 2022-08-29 09:54:32

solution1
0 2022-08-29 09:54:32