简体   繁体   中英

How to read data from S3 using python in Azure ML

import boto3
import io
import pandas as pd

# The entry point function can contain up to two input arguments:
#   Param<dataframe1>: a pandas.DataFrame
#   Param<dataframe2>: a pandas.DataFrame
def azureml_main(dataframe1 = None, dataframe2 = None):
    s3 = boto3.client('s3',
    aws_access_key_id='REMOVED',
    aws_secret_access_key='REMOVED')
    obj = s3.get_object(Bucket='bucket', Key='data.csv000')
    df = pd.read_csv(io.BytesIO(obj['Body'].read()))
    return df,

I'm tring to read data from S3 using the Execute Python module. I have downloaded the boto3 package and converted it to a zip. I have then uploaded and connected that .zip to the third input option of the module. When I run this code, I recieve an error stating botocore is not installed. Has anyone been able to read directly from S3 into Azure ML studio? I've tried using the R script module which also fails, so now I'm trying python.

Since the boto3 package has dependencies , even some that are cloned from git, I don't think Azure ML Studio can use it. According to the note in their documentation it would be easier to switch to Azure ML Workbench since it can handle Python packages much easier.

Another option, if you need to use Azure ML Studio, is to copy from S3 into Azure Blob Storage, which ML Studio has great support for.

Not much of an answer, but I'm afraid you've hit a limitation of Azure ML Studio.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM