简体   繁体   English

导出任务运行时会创建一个新的 object:将 cloudwatch 日志导出到 s3 存储桶

[英]a new object get created when the export task runs: exporting cloudwatch logs to s3 bucket

I'm trying to move CloudWatch logs from a log groups to an s3 bucket我正在尝试将 CloudWatch 日志从日志组移动到 s3 存储桶

in the bucket I have three objects (folders): Test, Dev, Prod在存储桶中我有三个对象(文件夹):Test、Dev、Prod

When I create an export task specifying the dev folder for example as an S3 bucket prefix, the logs stream are successfully exported, but the problem is another folder is created with the name of the taskID in dev folder as a subfolder !当我创建一个将 dev 文件夹指定为例如 S3 存储桶前缀的导出任务时,日志 stream 已成功导出,但问题是创建了另一个文件夹,其中 dev 文件夹中的 taskID 名称作为子文件夹!

在此处输入图像描述

在此处输入图像描述 Any way to put the logs stream directly to the dev folder or this is uncontrolled behavior?有什么方法可以将日志 stream 直接放到 dev 文件夹中,或者这是不受控制的行为?

There is no way to send the logs as files only using export task , as from AWS point of view you export a collection of items.无法仅使用export task将日志作为文件发送,因为从 AWS 的角度来看,您导出的是项目集合。 Another problem, is that export task cannot be done automatically, it's just a manual step (Not suitable if you want to do this operation multiple times for the whole cloud watch logs, or whenever there is a new logs generated.)另一个问题是export task不能自动完成,它只是一个手动步骤(如果你想对整个云监视日志多次执行此操作,或者每当有新日志生成时,就不适合了。)

Another Solution:另一个解决方案:

You could implement your own AWS lambda to be triggered based on scheduled time to import only the files from every AWS cloudWtach log Group you want.您可以实施自己的AWS lambda以根据scheduled time触发,以仅从您想要的每个AWS cloudWtach日志组导入文件。

Also, storing logs can get bigger and bigger in the future and you don't always need to retrieve them in a short period, so it's not a great solution for storing logs on the long run from a Cost perspective, you could use Glacier for this purpose after you transfer your logs to AWS S3 using storage class transition此外,存储日志在未来会变得越来越大,您并不总是需要在短时间内检索它们,因此从Cost的角度来看,这不是长期存储日志的好解决方案,您可以使用Glacier使用storage class transition将日志传输到AWS S3后的此目的

Corner Case: AWS only allows one Export Task running per account.极端情况:AWS 只允许每个账户运行一个导出任务。 This means that if the lambda function try to export multiple log groups at once, you will get a LimitExceededException error.这意味着如果 lambda function 尝试一次导出多个日志组,您将得到LimitExceededException错误。

Please refer to this tutorial of how to implement such a solution. 请参阅本教程了解如何实施此类解决方案。

Based on your comment on the answer:根据您对答案的评论:

No there is no way around for you to directly copy the logs from the logGroup as files, as by default AWS put the task_id of the exported logs as the name of the collection of the logs and this cannot be changed as it's uncontrolled behavior .不,您没有办法直接将日志从logGroup复制为文件,因为默认情况下AWS将导出日志的task_id作为日志集合的名称,并且由于这是uncontrolled behavior ,因此this cannot be changed

Also, There will be always a sub-directory that will be generated due to you need to set the destinationPrefix and if you didn't specify it in the request, then the default will be exportedlogs , as follow using Boto3 CloudWatchLogs :此外,由于您需要设置destinationPrefix ,因此总会生成一个子目录,如果您没有在请求中指定它,那么默认将是exportedlogs ,如下使用Boto3 CloudWatchLogs

response = logs.create_export_task(
            logGroupName=log_group_name,
            fromTime=from_to_time,
            to=export_to_time,
            destination=S3_BUCKET,
            destinationPrefix=custom_task_id.strip("/")
        )
        print("Task created: %s" % response['taskId'])

But you can Copy the log files after the CreateExportTask API finishes from the created sub-directory and then delete every file you copy, but before that you need to list all the items before every deletation to know when to stop and when to delete the sub-directory.但是您可以在CreateExportTask API完成后从创建的子目录复制日志文件,然后删除您复制的每个文件,但在此之前您需要在每次删除之前列出所有项目以了解何时停止以及何时删除子目录-目录。

To summerize:总结:

1- List items inside the bucket sub-directory. 1- 列出 bucket 子目录中的项目。

2- copy a file to a new one (removing the taskId from the file name and leave the destinationPerfix ). 2- 将文件复制到新文件(从文件名中删除 taskId 并保留destinationPerfix )。

3- delete the original. 3-删除原来的。

4- iterate until you copy and delete all the logs inside the taskId directory on a single run of your lambda OR you can invoke your lambda multiple times based on the number of the logs inside your taskId sub-directory . 4- 迭代,直到您在单次运行taskId directory内的所有日志, OR您可以根据taskId sub-directory内的日志数量多次调用 lambda。

4- Delete the whole sub-directory ( taskId directory) 4- 删除整个子目录( taskId目录)

This is a sample code, using boto3:这是一个示例代码,使用 boto3:

import sys, traceback, json
import boto3
s3 = boto3.client('s3')
BUCKET_NAME = '<Bucket Name>'
LOG_GROUP_NAME = 'export-task-test'


def copy_object(src_key, dst_key):
   try:
     result = s3.copy_object(
        Bucket=BUCKET_NAME,
        CopySource='%s/%s' % (BUCKET_NAME, src_key),
        Key=dst_key
    )
     result = True
   except:
     traceback.print_exc()
     result = False
return result


def delete_object(src_key):
  try:
    s3.delete_object(
        Bucket=BUCKET_NAME,
        Key=src_key
    )
    result = True
  except:
    traceback.print_exc()
    result = False
return result


def cleanup_objects(src_key):
   # Delete Old Object & aws-logs-write-test Object
   return delete_object(src_key) and delete_object('exportedlogs/aws-logs-write-test')


def move_log(src_key):
  result = False
  group_stream_log_list = [LOG_GROUP_NAME] + src_key.split('/')[-2:]
  # <LogGrpupName>/<LogStreamName>/<LogName>
  dst_key = '/'.join(group_stream_log_list)
  return copy_object(src_key, dst_key)


def move_log_and_cleanup_objects(record):
  src_key = record['s3']['object']['key']
  event_name_category = record['eventName'].split(':')[0]
  if event_name_category != 'ObjectCreated':
    return 'Skipped: %s %s' % (event_name_category, src_key)
  if not move_log(src_key):
    return 'Failed to move %s' % src_key
  if not cleanup_objects(src_key):
    return 'Failed to cleanup %s' % src_key
  return 'Successfully moved %s' % src_key


def lambda_handler(event, context):
  record = event['Records'][0]
  result = move_log_and_cleanup_objects(record)
  print(result)
  return result

Please, refer to this detailed solution of how to implement the above using lambda if your logs is not that intensive 如果您的日志不是那么密集,请参阅如何使用 lambda 实现上述内容的详细解决方案

This may be an intensive task based on the number of logs you have and also if you need to change the name of the TaskId for the whole logs you imported to your S3 bucket at once to cut-off the time needed to interrupt every log to change it's name, if you have thousands of logs, as this will add an overhead time, so you may need to go to another serverless architecture as using AWS ECS and AWS Fargate by putting all your code in a container and trigger based on a ECS Scheduled Task , which will start the Fargate container at certain times of day, the container will stop once the process exits, but you will achieve un-limited timeout this solution gives you and the best Cost you can get for this task.根据您拥有的日志数量,这可能是一项密集型任务,如果您需要立即更改导入到S3 bucket的整个日志的TaskId名称,以缩短中断每个日志所需的时间改变它的名字,如果你有成千上万的日志,因为这会增加开销时间,所以你可能需要 go 到另一个无服务器架构,因为使用AWS ECSAWS Fargate通过将所有代码放在一个容器中并基于ECS触发计划任务,它将在一天中的特定时间启动 Fargate 容器,一旦进程退出,容器将停止,但您将实现此解决方案为您提供的无限制超时以及您可以获得此任务的最佳成本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM