简体   繁体   中英

Access files in blob storage in R scripts in Azure Machine Learning?

What is the easy way to access (read and write) files in blob storage in R scripts in Azure Machine Learning?

I can access files in blob storage in python scripts using azure modules, but there seems no easy way to access by R scripts.

I tried to import Azure SMR as a zip file in the R script, but the importing all dependencies is very tough work,

https://github.com/Microsoft/AzureSMR

Any suggestion and help is appreciated.

It sounds like you knew how to install & use R packages on Azure ML. If not, please see the document Installing R package in Azure Machine Learning and use R package to try again.

Per my experience, I think the R package AzureSMR is not designed for only using Azure Storage, but for Resource Management. So it's not a good idea to use it in Azure ML, and you need to do more works which include register an app on Azure AD, etc, to make the code using its APIs works.

My suggestion is that trying to use the REST APIs of Azure Blob Storage via using a R package httr in the Execute R Script of Azure ML. You can refer to the SO thread Azure PUT Blob authentication fails in R to know how to do this. Meanwhile, the source code of AzureSMR is very valuable for you to reuse & rewrite these common functions for authentication or doing the blob CRUD operations.

Hope it helps. Any concern, please feel free to let me know.

Thank you for your suggestion, Perter Pan.

I followed Azure PUT Blob authentication fails in R

However, the script runs fail. The error message was

error:1411809D:SSL routines:SSL_CHECK_SERVERHELLO_TLSEXT:tls invalid 
ecpointformat list

I thought the problem may related to https access. (I write this because when using python script for accessing blob storage in Azure ML, I had also followed Access Azure blog storage from within an Azure ML experiment )

The same problem I found for R is Error:1411809D:SSL routines - When trying to make https call from inside R module in AzureML

So then, I changed the https to http. But the script tries to access the blob storage many times and never finish running. I can find the request number very increased in the storage of Azure portal.

My code is actually the similar to Azure PUT Blob authentication fails in R except that the request url changed to http

The script is blow.

library(httr)

account <- "accountname"
container <- "containrname"
filename <- "test.txt"
key <- "8FS+3i9eXx....r54Gl97F0nVwyDcV7lXbcWhmQ=="
object <- "Hello World" 

url <- paste0("http://", account, ".blob.core.windows.net/", container, 
"/", filename)
requestdate <- format(Sys.time(),"%a, %d %b %Y %H:%M:%S %Z", tz="GMT")  

content_length <- nchar(object, type = "bytes")  

signature_string <- paste0("PUT", "\n",            # HTTP Verb
                       "\n",                   # Content-Encoding  
                       "\n",                   # Content-Language
                       content_length, "\n",   # Content-Length
                       "\n",                   # Content-MD5
                       "text/plain", "\n",     # Content-Type
                       "\n",                   # Date
                       "\n",                   # If-Modified-Since
                       "\n",                   # If-Match  
                       "\n",                   # If-None-Match
                       "\n",                   # If-Unmodified-Since
                       "\n",                   # Range
                       # Here comes the Canonicalized Headers
                       "x-ms-blob-type:BlockBlob","\n",
                       "x-ms-date:",requestdate,"\n",
                       "x-ms-version:2015-02-21","\n",
                       # Here comes the Canonicalized Resource
                       "/",account, "/",container,"/", filename)

headerstuff <- add_headers(Authorization=paste0("SharedKey 
",account,":", 
                   RCurl::base64(digest::hmac(key = 
RCurl::base64Decode(key, mode = "raw"),
                   object = enc2utf8(signature_string),
                   algo = "sha256", raw = TRUE))),
                   `Content-Length` = content_length,
                   `x-ms-date`= requestdate,
                   `x-ms-version`= "2015-02-21",
                   `x-ms-blob-type`="BlockBlob",
                   `Content-Type`="text/plain")


content(PUT(url, config = headerstuff, body = object, verbose()), as = 
"text")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM