简体   繁体   English

使用 Lambda 上传到 S3

[英]Using Lambda to Upload to S3

I Created a lambda function that downloads data from S3 then preforms a merge and then re-uploads it back to S3 but i have been getting this我创建了一个 lambda function,它从 S3 下载数据然后执行合并,然后将其重新上传回 S3,但我一直得到这个

error { "errorMessage": "2020-05-18T23:23:27.556Z 37233f48-18ea-43eb-9030-3e8a2bf62048 Task timed out after 3.00 seconds" }错误 {“errorMessage”:“2020-05-18T23:23:27.556Z 37233f48-18ea-43eb-9030-3e8a2bf62048 任务在 3.00 秒后超时”}

When I remove the lines between 45 and 58 it works just fine当我删除 45 和 58 之间的线时,它工作得很好

在此处输入图像描述 https://ideone.com/RvOmPS https://ideone.com/RvOmPS

import pandas as pd import numpy as np import time from io import StringIO # python3; import pandas as pd import numpy as np import time from io import StringIO # python3; python2: BytesIO import boto3 import s3fs from botocore.exceptions import NoCredentialsError python2:BytesIO import boto3 import s3fs from botocore.exceptions import NoCredentialsError

def lambda_handler(event, context): def lambda_handler(事件,上下文):

# Dataset 1
# loading the data
df1 = pd.read_csv("https://i...content-available-to-author-only...s.com/Minimum+Wage+Data.csv",encoding= 'unicode_escape')

# Renaming the columns.
df1.rename(columns={'High.Value': 'min_wage_by_law', 'Low.Value': 'min_wage_real'}, inplace=True)

# Removing all unneeded values.
df1 = df1.drop(['Table_Data','Footnote','High.2018','Low.2018'], axis=1)
df1 = df1.loc[df1['Year']>1969].copy()

# ---------------------------------

# Dataset 2
# Loading from the debt S3 bucket
df2 = pd.read_csv("https://i...content-available-to-author-only...s.com/USGS_Final_File.csv") 

#Filtering getting the range in between 1969 and 2018.
df2 = df2.loc[df2['Year']>1969].copy()
df2 = df2.loc[df2['Year']<2018].copy()
df2.rename(columns={'Real State Growth %': 'Real State Growth','Population (million)':'Population Mil'}, inplace=True)

# Cleaning the data
df2['State Debt'] = df2['State Debt'].str.replace(',', '')
df2['Local Debt'] = df2['Local Debt'].str.replace(',', '')
df2["State and Local Debt"] = df2["State and Local Debt"].str.replace(',', '')
df2["Gross State Product"] = df2["Gross State Product"].str.replace(',', '')

# Cast to Floating
df2[["State Debt","Local Debt","State and Local Debt","Gross State Product"]] = df2[[ "State Debt","Local Debt","State and Local Debt","Gross State Product"]].apply(pd.to_numeric)

# --------------------------------------------
# Merge the data through an inner join.
full = pd.merge(df1,df2,on=['State','Year'])
#--------------------------------------------
filename = '/tmp/'#specify location of s3:/{my-bucket}/
file= 'debt_and_wage' #name of file
datetime = time.strftime("%Y%m%d%H%M%S") #timestamp
filenames3 = "%s%s%s.csv"%(filename,file,datetime) #name of the filepath and csv file

full.to_csv(filenames3, header = True)

## Saving it on AWS

s3 = boto3.resource('s3',aws_access_key_id='accesskeycantshare',aws_secret_access_key= 'key')

s3.meta.client.upload_file(filenames3, 'information-arch',file+datetime+'.csv')

Your default lambda execution timeout is 3 seconds .您的默认lambda 执行超时3 秒 Please increase it to what suits your task:请将其增加到适合您的任务:

Timeout – The amount of time that Lambda allows a function to run before stopping it.超时 – Lambda 允许 function 在停止之前运行的时间量。 The default is 3 seconds .默认值为3 秒 The maximum allowed value is 900 seconds .最大允许值为900 秒

You should increase the timeout of your lambda function.您应该增加 lambda function 的超时时间。 The default behavior of a newly created function is to terminate after 3 seconds.新创建的 function 的默认行为是在 3 秒后终止。

Thanks, everybody for the help, just did what increased the timeout and it worked just fine, I did it through here谢谢大家的帮助,只是做了什么增加了超时,它工作得很好,我通过here做到了

在此处输入图像描述

IF the jobs size is big you can try increasing the memory of the function.如果作业大小很大,您可以尝试增加 function 的 memory。 For now increase the timeout of the function.现在增加 function 的超时时间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM