简体   繁体   English

AWS Cloudwatch 日志 - 是否可以从中导出现有日志数据?

[英]AWS Cloudwatch Log - Is it possible to export existing log data from it?

I have managed to push my application logs to AWS Cloudwatch by using the AWS CloudWatch log agent.我已设法使用 AWS CloudWatch 日志代理将我的应用程序日志推送到 AWS Cloudwatch。 But the CloudWatch web console does not seem to provide a button to allow you to download/export the log data from it.但是 CloudWatch web 控制台似乎没有提供一个按钮来允许您从中下载/导出日志数据。

Any idea how I can achieve this goal?知道如何实现这个目标吗?

The latest AWS CLI has a CloudWatch Logs cli, that allows you to download the logs as JSON, text file or any other output supported by AWS CLI.最新的 AWS CLI 有一个 CloudWatch Logs cli,它允许您将日志下载为 JSON、文本文件或 AWS CLI 支持的任何其他输出。

For example to get the first 1MB up to 10,000 log entries from the stream a in group A to a text file, run:例如,要从A组中的流a中获取前 1MB 最多 10,000 个日志条目到文本文件,请运行:

aws logs get-log-events \
   --log-group-name A --log-stream-name a \
   --output text > a.log

The command is currently limited to a response size of maximum 1MB (up to 10,000 records per request), and if you have more you need to implement your own page stepping mechanism using the --next-token parameter.该命令当前的响应大小限制为最大 1MB(每个请求最多 10,000 条记录),如果您有更多记录,则需要使用--next-token参数实现自己的页面步进机制。 I expect that in the future the CLI will also allow full dump in a single command.我希望将来 CLI 也将允许在单个命令中进行完整转储。

Update更新

Here's a small Bash script to list events from all streams in a specific group, since a specified time:这是一个小的 Bash 脚本,用于列出特定组中所有流的事件,从指定时间开始:

#!/bin/bash
function dumpstreams() {
  aws $AWSARGS logs describe-log-streams \
    --order-by LastEventTime --log-group-name $LOGGROUP \
    --output text | while read -a st; do 
      [ "${st[4]}" -lt "$starttime" ] && continue
      stname="${st[1]}"
      echo ${stname##*:}
    done | while read stream; do
      aws $AWSARGS logs get-log-events \
        --start-from-head --start-time $starttime \
        --log-group-name $LOGGROUP --log-stream-name $stream --output text
    done
}

AWSARGS="--profile myprofile --region us-east-1"
LOGGROUP="some-log-group"
TAIL=
starttime=$(date --date "-1 week" +%s)000
nexttime=$(date +%s)000
dumpstreams
if [ -n "$TAIL" ]; then
  while true; do
    starttime=$nexttime
    nexttime=$(date +%s)000
    sleep 1
    dumpstreams
  done
fi

That last part, if you set TAIL will continue to fetch log events and will report newer events as they come in (with some expected delay).最后一部分,如果您设置TAIL将继续获取日志事件,并会在更新的事件进来时报告(有一些预期的延迟)。

There is also a python project called awslogs , allowing to get the logs: https://github.com/jorgebastida/awslogs还有一个名为awslogs的 python 项目,允许获取日志: https : //github.com/jorgebastida/awslogs

There are things like:有这样的事情:

list log groups:列出日志组:

$ awslogs groups

list streams for given log group:列出给定日志组的流:

$ awslogs streams /var/log/syslog

get the log records from all streams:从所有流中获取日志记录:

$ awslogs get /var/log/syslog

get the log records from specific stream :从特定流中获取日志记录:

$ awslogs get /var/log/syslog stream_A

and much more (filtering for time period, watching log streams...还有更多(过滤时间段,查看日志流......

I think, this tool might help you to do what you want.我认为,这个工具可能会帮助你做你想做的事。

It seems AWS has added the ability to export an entire log group to S3. AWS 似乎增加了将整个日志组导出到 S3 的功能。

导出到 S3 菜单

导出到 S3 表单

You'll need to setup permissions on the S3 bucket to allow cloudwatch to write to the bucket by adding the following to your bucket policy, replacing the region with your region and the bucket name with your bucket name.您需要在 S3 存储桶上设置权限以允许 cloudwatch 通过将以下内容添加到存储桶策略中写入存储桶,将区域替换为您的区域,将存储桶名称替换为您的存储桶名称。

    {
        "Effect": "Allow",
        "Principal": {
            "Service": "logs.us-east-1.amazonaws.com"
        },
        "Action": "s3:GetBucketAcl",
        "Resource": "arn:aws:s3:::tsf-log-data"
    },
    {
        "Effect": "Allow",
        "Principal": {
            "Service": "logs.us-east-1.amazonaws.com"
        },
        "Action": "s3:PutObject",
        "Resource": "arn:aws:s3:::tsf-log-data/*",
        "Condition": {
            "StringEquals": {
                "s3:x-amz-acl": "bucket-owner-full-control"
            }
        }
    }

Details can be found in Step 2 of this AWS doc详细信息可以在此 AWS 文档的第 2 步中找到

I would add that one liner to get all logs for a stream :我会添加一个班轮来获取流的所有日志:

aws logs get-log-events --log-group-name my-log-group --log-stream-name my-log-stream | grep '"message":' | awk -F '"' '{ print $(NF-1) }' > my-log-group_my-log-stream.txt

Or in a slightly more readable format :或者以更易读的格式:

aws logs get-log-events \
    --log-group-name my-log-group\
    --log-stream-name my-log-stream \
    | grep '"message":' \
    | awk -F '"' '{ print $(NF-1) }' \
    > my-log-group_my-log-stream.txt

And you can make a handy script out of it that is admittedly less powerful than @Guss's but simple enough.你可以用它制作一个方便的脚本,虽然它不如@Guss 的强大,但足够简单。 I saved it as getLogs.sh and invoke it with ./getLogs.sh log-group log-stream我将它保存为getLogs.sh并使用./getLogs.sh log-group log-stream调用它

#!/bin/bash

if [[ "${#}" != 2 ]]
then
    echo "This script requires two arguments!"
    echo
    echo "Usage :"
    echo "${0} <log-group-name> <log-stream-name>"
    echo
    echo "Example :"
    echo "${0} my-log-group my-log-stream"

    exit 1
fi

OUTPUT_FILE="${1}_${2}.log"
aws logs get-log-events \
    --log-group-name "${1}"\
    --log-stream-name "${2}" \
    | grep '"message":' \
    | awk -F '"' '{ print $(NF-1) }' \
    > "${OUTPUT_FILE}"

echo "Logs stored in ${OUTPUT_FILE}"

Apparently there isn't an out-of-box way from AWS Console where you can download the CloudWatchLogs.显然,AWS 控制台没有开箱即用的方式,您可以在其中下载 CloudWatchLogs。 Perhaps you can write a script to perform the CloudWatchLogs fetch using the SDK / API.也许您可以编写一个脚本来使用 SDK/API 执行 CloudWatchLogs 提取。

The good thing about CloudWatchLogs is that you can retain the logs for infinite time(Never Expire); CloudWatchLogs 的好处是您可以无限期保留日志(永不过期); unlike the CloudWatch which just keeps the logs for just 14 days.与仅将日志保留 14 天的 CloudWatch 不同。 Which means you can run the script in monthly / quarterly frequency rather than on-demand.这意味着您可以按月/季度频率运行脚本,而不是按需运行。

More information about the CloudWatchLogs API, http://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/Welcome.html http://awsdocs.s3.amazonaws.com/cloudwatchlogs/latest/cwl-api.pdf有关 CloudWatchLogs API 的更多信息,请访问http://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/Welcome.html http://awsdocs.s3.amazonaws.com/cloudwatchlogs/latest/cwl-api.pdf

You can now perform exports via the Cloudwatch Management Console with the new Cloudwatch Logs Insights page.您现在可以使用新的 Cloudwatch Logs Insights 页面通过 Cloudwatch 管理控制台执行导出。 Full documentation here https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_ExportQueryResults.html .此处提供完整文档https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_ExportQueryResults.html I had already started ingesting my Apache logs into Cloudwatch with JSON, so YMMV if you haven't set it up in advance.我已经开始使用 JSON 将我的 Apache 日志摄取到 Cloudwatch 中,所以如果您还没有提前设置它,那么 YMMV。

Add Query to Dashboard or Export Query Results向仪表板添加查询或导出查询结果

After you run a query, you can add the query to a CloudWatch dashboard, or copy the results to the clipboard.运行查询后,您可以将查询添加到 CloudWatch 控制面板,或将结果复制到剪贴板。

Queries added to dashboards automatically re-run every time you load the dashboard and every time that the dashboard refreshes.每次加载仪表板和刷新仪表板时,添加到仪表板的查询都会自动重新运行。 These queries count toward your limit of four concurrent CloudWatch Logs Insights queries.这些查询计入四个并发 CloudWatch Logs Insights 查询的限制。

To add query results to a dashboard将查询结果添加到仪表板

Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/ .通过https://console.aws.amazon.com/cloudwatch/打开 CloudWatch 控制台。

In the navigation pane, choose Insights.在导航窗格中,选择见解。

Choose one or more log groups and run a query.选择一个或多个日志组并运行查询。

Choose Add to dashboard.选择添加到仪表板。

Select the dashboard, or choose Create new to create a new dashboard for the query results.选择仪表板,或选择新建来为查询结果创建新仪表板。

Choose Add to dashboard.选择添加到仪表板。

To copy query results to the clipboard将查询结果复制到剪贴板

Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/ .通过https://console.aws.amazon.com/cloudwatch/打开 CloudWatch 控制台。

In the navigation pane, choose Insights.在导航窗格中,选择见解。

Choose one or more log groups and run a query.选择一个或多个日志组并运行查询。

Choose Actions, Copy query results.选择操作、复制查询结果。

I found AWS Documentation to be complete and accurate.我发现 AWS 文档完整且准确。 https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/S3ExportTasks.html This laid down steps for exporting logs from Cloudwatch to S3 https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/S3ExportTasks.html这规定了将日志从 Cloudwatch 导出到 S3 的步骤

Adapted @Guyss answer to macOS.将@Guyss 的答案改编为 macOS。 As I am not really a bash guy, had to use python, to convert dates to a human-readable form.因为我不是一个真正的 bash 人,不得不使用 python 将日期转换为人类可读的形式。

runaswslog -1w gets last week and so on runaswslog -1w获取上周等

runawslog() { sh awslogs.sh $1 | grep "EVENTS" | python parselogline.py; }

awslogs.sh: awslogs.sh:

#!/bin/bash
#set -x
function dumpstreams() {
  aws $AWSARGS logs describe-log-streams \
    --order-by LastEventTime --log-group-name $LOGGROUP \
    --output text | while read -a st; do 
      [ "${st[4]}" -lt "$starttime" ] && continue
      stname="${st[1]}"
      echo ${stname##*:}
    done | while read stream; do
      aws $AWSARGS logs get-log-events \
        --start-from-head --start-time $starttime \
        --log-group-name $LOGGROUP --log-stream-name $stream --output text
    done
}
AWSARGS=""
#AWSARGS="--profile myprofile --region us-east-1"
LOGGROUP="/aws/lambda/StockTrackFunc"
TAIL=
FROMDAT=$1
starttime=$(date -v ${FROMDAT} +%s)000
nexttime=$(date +%s)000
dumpstreams
if [ -n "$TAIL" ]; then
  while true; do
    starttime=$nexttime
    nexttime=$(date +%s)000
    sleep 1
    dumpstreams
  done
fi

parselogline.py: parselogline.py:

import sys
import datetime
dat=sys.stdin.read()
for k in dat.split('\n'):
    d=k.split('\t')
    if len(d)<3:
        continue
    d[2]='\t'.join(d[2:])
    print( str(datetime.datetime.fromtimestamp(int(d[1])/1000)) + '\t' + d[2] )

export LOGGROUPNAME=[SOME_LOG_GROUP_NAME]; for LOGSTREAM in `aws --output text logs describe-log-streams --log-group-name ${LOGGROUPNAME} |awk '{print $7}'`; do aws --output text logs get-log-events --log-group-name ${LOGGROUPNAME} --log-stream-name ${LOGSTREAM} >> ${LOGGROUPNAME}_output.txt; done

I had a similar use case where i had to download all the streams for a given log group.我有一个类似的用例,我必须下载给定日志组的所有流。 See if this script helps.看看这个脚本是否有帮助。

#!/bin/bash

if [[ "${#}" != 1 ]]
then
    echo "This script requires two arguments!"
    echo
    echo "Usage :"
    echo "${0} <log-group-name>"

    exit 1
fi

streams=`aws logs describe-log-streams --log-group-name "${1}"`


for stream in $(jq '.logStreams | keys | .[]' <<< "$streams"); do 
    record=$(jq -r ".logStreams[$stream]" <<< "$streams")
    streamName=$(jq -r ".logStreamName" <<< "$record")
    echo "Downloading ${streamName}";
    echo `aws logs get-log-events --log-group-name "${1}" --log-stream-name "$streamName" --output json > "${stream}.log" `
    echo "Completed dowload:: ${streamName}";
done;

You have have pass log group name as an argument.您已将日志组名称作为参数传递。

Eg: bash <name_of_the_bash_file>.sh <group_name>例如:bash <name_of_the_bash_file>.sh <group_name>

The other answers were not useful with AWS Lambda logs since they create many log streams and I just wanted to dump everything in the last week.其他答案对 AWS Lambda 日志没有用,因为它们创建了许多日志流,我只想在上周转储所有内容。 I finally found the following command to be what I needed:我终于发现以下命令是我需要的:

aws logs tail --since 1w  LOG_GROUP_NAME > output.log

Note that LOG_GROUP_NAME is the lambda function path (eg /aws/lambda/FUNCTION_NAME) and you can replace the since argument with a variety of times (1w = 1 week, 5m = 5 minutes, etc)请注意,LOG_GROUP_NAME 是 lambda function 路径(例如 /aws/lambda/FUNCTION_NAME),您可以将 since 参数替换为各种时间(1w = 1 周、5m = 5 分钟等)

Inspired by saputkin I have created a pyton script that downloads all the logs for a log group in given time period.受 saputkin 的启发,我创建了一个 pyton 脚本,用于在给定时间段内下载日志组的所有日志。

The script itself: https://github.com/slavogri/aws-logs-downloader.git脚本本身: https : //github.com/slavogri/aws-logs-downloader.git

In case there are multiple log streams for that period multiple files will be created.如果该时期有多个日志流,将创建多个文件。 Downloaded files will be stored in current directory, and will be named by the log streams that has a log events in given time period.下载的文件将存储在当前目录中,并将由在给定时间段内具有日志事件的日志流命名。 (If the group name contains forward slashes, they will be replaced by underscores. Each file will be overwritten if it already exists.) (如果组名包含正斜杠,它们将被下划线替换。如果每个文件已经存在,它将被覆盖。)

Prerequisite: You need to be logged in to your aws profile.先决条件:您需要登录到您的 aws 个人资料。 The Script itself is going to use on behalf of you the AWS command line APIs: "aws logs describe-log-streams" and "aws logs get-log-events"脚本本身将代表您使用 AWS 命令​​行 API:“aws logs describe-log-streams”和“aws logs get-log-events”

Usage example: python aws-logs-downloader -g /ecs/my-cluster-test-my-app -t "2021-09-04 05:59:50 +00:00" -i 60

optional arguments:
   -h, --help         show this help message and exit
   -v, --version      show program's version number and exit
   -g , --log-group   (required) Log group name for which the log stream events needs to be downloaded
   -t , --end-time    (default: now) End date and time of the downloaded logs in format: %Y-%m-%d %H:%M:%S %z (example: 2021-09-04 05:59:50 +00:00)
   -i , --interval    (default: 30) Time period in minutes before the end-time. This will be used to calculate the time since which the logs will be downloaded.
   -p , --profile     (default: dev) The aws profile that is logged in, and on behalf of which the logs will be downloaded.
   -r , --region      (default: eu-central-1) The aws region from which the logs will be downloaded.

Please let me now if it was useful to you.如果对您有用,请现在告诉我。 :) :)

After I did it I learned that there is another option using Boto3: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/logs.html#CloudWatchLogs.Client.get_log_events完成之后,我了解到还有另一个使用 Boto3 的选项: https ://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/logs.html#CloudWatchLogs.Client.get_log_events

Still the command line API seems to me like a good option.在我看来,命令行 API 仍然是一个不错的选择。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM