[英]a new object get created when the export task runs: exporting cloudwatch logs to s3 bucket
I'm trying to move CloudWatch logs from a log groups to an s3 bucket我正在尝试将 CloudWatch 日志从日志组移动到 s3 存储桶
in the bucket I have three objects (folders): Test, Dev, Prod在存储桶中我有三个对象(文件夹):Test、Dev、Prod
When I create an export task specifying the dev folder for example as an S3 bucket prefix, the logs stream are successfully exported, but the problem is another folder is created with the name of the taskID in dev folder as a subfolder !当我创建一个将 dev 文件夹指定为例如 S3 存储桶前缀的导出任务时,日志 stream 已成功导出,但问题是创建了另一个文件夹,其中 dev 文件夹中的 taskID 名称作为子文件夹!
Any way to put the logs stream directly to the dev folder or this is uncontrolled behavior?有什么方法可以将日志 stream 直接放到 dev 文件夹中,或者这是不受控制的行为?
There is no way to send the logs as files only using export task
, as from AWS point of view you export a collection of items.无法仅使用export task
将日志作为文件发送,因为从 AWS 的角度来看,您导出的是项目集合。 Another problem, is that export task
cannot be done automatically, it's just a manual step (Not suitable if you want to do this operation multiple times for the whole cloud watch logs, or whenever there is a new logs generated.)另一个问题是export task
不能自动完成,它只是一个手动步骤(如果你想对整个云监视日志多次执行此操作,或者每当有新日志生成时,就不适合了。)
Another Solution:另一个解决方案:
You could implement your own AWS lambda
to be triggered based on scheduled time
to import only the files from every AWS cloudWtach
log Group you want.您可以实施自己的AWS lambda
以根据scheduled time
触发,以仅从您想要的每个AWS cloudWtach
日志组导入文件。
Also, storing logs can get bigger and bigger in the future and you don't always need to retrieve them in a short period, so it's not a great solution for storing logs on the long run from a Cost
perspective, you could use Glacier
for this purpose after you transfer your logs to AWS S3
using storage class transition
此外,存储日志在未来会变得越来越大,您并不总是需要在短时间内检索它们,因此从Cost
的角度来看,这不是长期存储日志的好解决方案,您可以使用Glacier
使用storage class transition
将日志传输到AWS S3
后的此目的
Corner Case: AWS only allows one Export Task running per account.极端情况:AWS 只允许每个账户运行一个导出任务。 This means that if the lambda function try to export multiple log groups at once, you will get a LimitExceededException
error.这意味着如果 lambda function 尝试一次导出多个日志组,您将得到LimitExceededException
错误。
Please refer to this tutorial of how to implement such a solution. 请参阅本教程了解如何实施此类解决方案。
Based on your comment on the answer:根据您对答案的评论:
No there is no way around for you to directly copy the logs from the logGroup
as files, as by default AWS
put the task_id
of the exported logs as the name of the collection of the logs and this cannot be changed
as it's uncontrolled behavior
.不,您没有办法直接将日志从logGroup
复制为文件,因为默认情况下AWS
将导出日志的task_id
作为日志集合的名称,并且由于这是uncontrolled behavior
,因此this cannot be changed
。
Also, There will be always a sub-directory that will be generated due to you need to set the destinationPrefix
and if you didn't specify it in the request, then the default will be exportedlogs
, as follow using Boto3 CloudWatchLogs :此外,由于您需要设置destinationPrefix
,因此总会生成一个子目录,如果您没有在请求中指定它,那么默认将是exportedlogs
,如下使用Boto3 CloudWatchLogs :
response = logs.create_export_task(
logGroupName=log_group_name,
fromTime=from_to_time,
to=export_to_time,
destination=S3_BUCKET,
destinationPrefix=custom_task_id.strip("/")
)
print("Task created: %s" % response['taskId'])
But you can Copy the log files after the CreateExportTask API
finishes from the created sub-directory and then delete every file you copy, but before that you need to list all the items before every deletation to know when to stop and when to delete the sub-directory.但是您可以在CreateExportTask API
完成后从创建的子目录复制日志文件,然后删除您复制的每个文件,但在此之前您需要在每次删除之前列出所有项目以了解何时停止以及何时删除子目录-目录。
To summerize:总结:
1- List items inside the bucket sub-directory. 1- 列出 bucket 子目录中的项目。
2- copy a file to a new one (removing the taskId from the file name and leave the destinationPerfix
). 2- 将文件复制到新文件(从文件名中删除 taskId 并保留destinationPerfix
)。
3- delete the original. 3-删除原来的。
4- iterate until you copy and delete all the logs inside the taskId directory
on a single run of your lambda OR
you can invoke your lambda multiple times based on the number of the logs inside your taskId sub-directory
. 4- 迭代,直到您在单次运行taskId directory
内的所有日志, OR
您可以根据taskId sub-directory
内的日志数量多次调用 lambda。
4- Delete the whole sub-directory ( taskId
directory) 4- 删除整个子目录( taskId
目录)
This is a sample code, using boto3:这是一个示例代码,使用 boto3:
import sys, traceback, json
import boto3
s3 = boto3.client('s3')
BUCKET_NAME = '<Bucket Name>'
LOG_GROUP_NAME = 'export-task-test'
def copy_object(src_key, dst_key):
try:
result = s3.copy_object(
Bucket=BUCKET_NAME,
CopySource='%s/%s' % (BUCKET_NAME, src_key),
Key=dst_key
)
result = True
except:
traceback.print_exc()
result = False
return result
def delete_object(src_key):
try:
s3.delete_object(
Bucket=BUCKET_NAME,
Key=src_key
)
result = True
except:
traceback.print_exc()
result = False
return result
def cleanup_objects(src_key):
# Delete Old Object & aws-logs-write-test Object
return delete_object(src_key) and delete_object('exportedlogs/aws-logs-write-test')
def move_log(src_key):
result = False
group_stream_log_list = [LOG_GROUP_NAME] + src_key.split('/')[-2:]
# <LogGrpupName>/<LogStreamName>/<LogName>
dst_key = '/'.join(group_stream_log_list)
return copy_object(src_key, dst_key)
def move_log_and_cleanup_objects(record):
src_key = record['s3']['object']['key']
event_name_category = record['eventName'].split(':')[0]
if event_name_category != 'ObjectCreated':
return 'Skipped: %s %s' % (event_name_category, src_key)
if not move_log(src_key):
return 'Failed to move %s' % src_key
if not cleanup_objects(src_key):
return 'Failed to cleanup %s' % src_key
return 'Successfully moved %s' % src_key
def lambda_handler(event, context):
record = event['Records'][0]
result = move_log_and_cleanup_objects(record)
print(result)
return result
Please, refer to this detailed solution of how to implement the above using lambda if your logs is not that intensive 如果您的日志不是那么密集,请参阅如何使用 lambda 实现上述内容的详细解决方案
This may be an intensive task based on the number of logs you have and also if you need to change the name of the TaskId
for the whole logs you imported to your S3 bucket
at once to cut-off the time needed to interrupt every log to change it's name, if you have thousands of logs, as this will add an overhead time, so you may need to go to another serverless architecture as using AWS ECS
and AWS Fargate
by putting all your code in a container and trigger based on a ECS Scheduled Task , which will start the Fargate container at certain times of day, the container will stop once the process exits, but you will achieve un-limited timeout this solution gives you and the best Cost you can get for this task.根据您拥有的日志数量,这可能是一项密集型任务,如果您需要立即更改导入到S3 bucket
的整个日志的TaskId
名称,以缩短中断每个日志所需的时间改变它的名字,如果你有成千上万的日志,因为这会增加开销时间,所以你可能需要 go 到另一个无服务器架构,因为使用AWS ECS
和AWS Fargate
通过将所有代码放在一个容器中并基于ECS触发计划任务,它将在一天中的特定时间启动 Fargate 容器,一旦进程退出,容器将停止,但您将实现此解决方案为您提供的无限制超时以及您可以获得此任务的最佳成本。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.