[英]Sagemaker studio takes forever to load data from s3
I am using the following script to load a csv file around 2G, and after 24 hours nothing happens, am I doing something wrong?我正在使用以下脚本加载大约 2G 的 csv 文件,24 小时后没有任何反应,我做错了吗?
nlp = spacy.load('en_core_sci_lg')
bundle = ''
pattern = ""
print('start running')
column_names = ["Origina_subject", "Predicted_subject", "Original_object","Predicted_object",'original_sent']
final_list = []
data_location = 's3://{}/{}'.format(bucket, file_name)
data = pd.read_csv(data_location)
print('finish loading')
this is the response I am getting, clearly not passing load:这是我得到的响应,显然没有传递负载:
arn:aws:iam::0*********
wait
start running
You can download the file to a temporal memory file using the boto3 client and read it with pandas您可以使用 boto3 客户端将文件下载到临时 memory 文件并使用 pandas 读取它
# Read dataframe
import boto3
import pandas as pd
from io import BytesIO
s3 = boto3.resource('s3')
bucket = 'your-bucket'
key = 'some/key/file'
with BytesIO() as data:
s3.Bucket(bucket).download_fileobj(key, data)
data.seek(0) # move back to the beginning after writing
df = pd.read_csv(data)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.