简体   繁体   English

Sagemaker studio 需要永远从 s3 加载数据

[英]Sagemaker studio takes forever to load data from s3

I am using the following script to load a csv file around 2G, and after 24 hours nothing happens, am I doing something wrong?我正在使用以下脚本加载大约 2G 的 csv 文件,24 小时后没有任何反应,我做错了吗?

nlp = spacy.load('en_core_sci_lg')
bundle = ''
​
pattern = ""
print('start running')
column_names = ["Origina_subject", "Predicted_subject", "Original_object","Predicted_object",'original_sent']
final_list = []
data_location = 's3://{}/{}'.format(bucket, file_name)
data = pd.read_csv(data_location)

print('finish loading')

this is the response I am getting, clearly not passing load:这是我得到的响应,显然没有传递负载:

arn:aws:iam::0*********
wait
start running

You can download the file to a temporal memory file using the boto3 client and read it with pandas您可以使用 boto3 客户端将文件下载到临时 memory 文件并使用 pandas 读取它

# Read dataframe
import boto3
import pandas as pd
from io import BytesIO
s3 = boto3.resource('s3')
bucket = 'your-bucket'
key = 'some/key/file'
with BytesIO() as data:
    s3.Bucket(bucket).download_fileobj(key, data)
    data.seek(0) # move back to the beginning after writing
    df = pd.read_csv(data)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM