简体   繁体   中英

Access Azure Blob Storage through R

I'm trying to use R to make a connection to Azure Blob from where I have some CSV files stored. I need to load them into a data frame and make some transformations to them before I write them back to another Blob container. I'm trying to do this through Databricks so I can ultimately call this notebook from Data Factories and include it in a pipeline.

Databricks gives me a sample notebook in Python, where a connection can be made with the following code:

storage_account_name = "testname"
storage_account_access_key = "..."
file_location = "wasb://example@testname.blob.core.windows.net/testfile.csv"

spark.conf.set(
  "fs.azure.account.key."+storage_account_name+".blob.core.windows.net",
  storage_account_access_key)

df = spark.read.format('csv').load(file_location, header = True, inferSchema = True)

Is there something similar in R? I can use the SparkR or Sparklyr package in R if it can help me load a file and place it in a Spark dataframe as well.

For your information, I have been informed that R is not capable of doing the actual mounting. The workaround is to mount using another language like Python and read the file using the library "SparkR" as shown below.

The two most commonly used libraries that provide an R interface to Spark are SparkR and sparklyr. Databricks notebooks and jobs support both packages, although you cannot use functions from both SparkR and sparklyr with the same object.

Mount using Python:

在此处输入图片说明

Run R notebook using the library “SparkR”:

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM