简体   繁体   English

python中如何使用boto3查询cloudwatch日志

[英]How to query cloudwatch logs using boto3 in python

I have a lambda function that writes metrics to Cloudwatch.我有一个 lambda function 将指标写入 Cloudwatch。 While, it writes metrics, It generates some logs in a log-group.在写入指标的同时,它会在日志组中生成一些日志。

INFO:: username: simran+test@example.com ClinicID: 7667 nodename: MacBook-Pro-2.local

INFO:: username: simran+test2@example.com ClinicID: 7667 nodename: MacBook-Pro-2.local

INFO:: username: simran+test@example.com ClinicID: 7668 nodename: MacBook-Pro-2.local

INFO:: username: simran+test3@example.com ClinicID: 7667 nodename: MacBook-Pro-2.local

I would like to query AWS logs in past x hours where x could be anywhere between 12 to 24 hours, based on any of the params.我想根据任何参数查询过去x小时内的 AWS 日志,其中 x 可能在 12 到 24 小时之间的任何地方。

For ex:例如:

  1. Query Cloudwatch logs in last 5 hours where ClinicID=7667查询最近 5 小时内的 Cloudwatch 日志,其中ClinicID=7667

or或者

  1. Query Cloudwatch logs in last 5 hours where ClinicID=7667 and username='simran+test@example.com'查询最近 5 小时内的 Cloudwatch 日志,其中ClinicID=7667username='simran+test@example.com'

or或者

  1. Query Cloudwatch logs in last 5 hours where username='simran+test@example.com'查询最近 5 小时内的 Cloudwatch 日志,其中username='simran+test@example.com'

I am using boto3 in Python.我在 Python 中使用boto3

You can get what you want using CloudWatch Logs Insights.您可以使用 CloudWatch Logs Insights 获得所需的信息。

You would use start_query and get_query_results APIs: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/logs.html您将使用start_queryget_query_results API: https ://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/logs.html

To start a query you would use (for use case 2 from your question, 1 and 3 are similar):要开始您将使用的查询(对于您的问题中的用例 2,1 和 3 是相似的):

import boto3
from datetime import datetime, timedelta
import time

client = boto3.client('logs')

query = "fields @timestamp, @message | parse @message \"username: * ClinicID: * nodename: *\" as username, ClinicID, nodename | filter ClinicID = 7667 and username='simran+test@example.com'"

log_group = '/aws/lambda/NAME_OF_YOUR_LAMBDA_FUNCTION'

start_query_response = client.start_query(
    logGroupName=log_group,
    startTime=int((datetime.today() - timedelta(hours=5)).timestamp()),
    endTime=int(datetime.now().timestamp()),
    queryString=query,
)

query_id = start_query_response['queryId']

response = None

while response == None or response['status'] == 'Running':
    print('Waiting for query to complete ...')
    time.sleep(1)
    response = client.get_query_results(
        queryId=query_id
    )

Response will contain your data in this format (plus some metadata):响应将包含这种格式的数据(加上一些元数据):

{
  'results': [
    [
      {
        'field': '@timestamp',
        'value': '2019-12-09 17:07:24.428'
      },
      {
        'field': '@message',
        'value': 'username: simran+test@example.com ClinicID: 7667 nodename: MacBook-Pro-2.local\n'
      },
      {
        'field': 'username',
        'value': 'simran+test@example.com'
      },
      {
        'field': 'ClinicID',
        'value': '7667'
      },
      {
        'field': 'nodename',
        'value': 'MacBook-Pro-2.local\n'
      }
    ]
  ]
}

You can achieve this with the cloudWatchlogs client and a little bit of coding.您可以使用cloudWatchlogs 客户端和一些编码来实现这一点。 You can also customize the conditions or use JSON module for a precise result.您还可以自定义条件或使用 JSON 模块来获得精确的结果。

EDIT编辑

You can use describe_log_streams to get the streams.您可以使用describe_log_streams获取流。 If you want only the latest, just put limit 1, or if you want more than one, use for loop to iterate all streams while filtering as mentioned below.如果您只想要最新的,只需设置限制 1,或者如果您想要多个,使用 for 循环在过滤时迭代所有流,如下所述。

    import boto3

    client = boto3.client('logs')

    ## For the latest
    stream_response = client.describe_log_streams(
        logGroupName="/aws/lambda/lambdaFnName", # Can be dynamic
        orderBy='LastEventTime',                 # For the latest events
        limit=1                                  # the last latest event, if you just want one
        )

    latestlogStreamName = stream_response["logStreams"]["logStreamName"]

    response = client.get_log_events(
        logGroupName="/aws/lambda/lambdaFnName",
        logStreamName=latestlogStreamName,
        startTime=12345678,
        endTime=12345678,
    )

    for event in response["events"]:
        if event["message"]["ClinicID"] == "7667":
            print(event["message"])
        elif event["message"]["username"] == "simran+test@example.com":
            print(event["message"])
        #.
        #.
        # more if or else conditions

    ## For more than one Streams, e.g. latest 5
    stream_response = client.describe_log_streams(
        logGroupName="/aws/lambda/lambdaFnName", # Can be dynamic
        orderBy='LastEventTime',                 # For the latest events
        limit=5
        )

    for log_stream in stream_response["logStreams"]:
        latestlogStreamName = log_stream["logStreamName"]

        response = client.get_log_events(
             logGroupName="/aws/lambda/lambdaFnName",
             logStreamName=latestlogStreamName,
             startTime=12345678,
             endTime=12345678,
        )
        ## For example, you want to search "ClinicID=7667", can be dynamic

        for event in response["events"]:
           if event["message"]["ClinicID"] == "7667":
             print(event["message"])
           elif event["message"]["username"] == "simran+test@example.com":
             print(event["message"])
           #.
           #.
           # more if or else conditions

Let me know how it goes.让我知道事情的后续。

I used awslogs .我用过awslogs if you install it, you can do.如果你安装它,你可以做到。 --watch will tail the new logs. --watch将跟踪新日志。

awslogs get /aws/lambda/log-group-1 --start="5h ago" --watch

You can install it using您可以使用安装它

pip install awslogs

to filter you can do:过滤你可以这样做:

awslogs get /aws/lambda/log-group-1  --filter-pattern '"ClinicID=7667"' --start "5h ago" --timestamp

It supports multiple filter patterns as well.它也支持多种过滤模式。

awslogs get /aws/lambda/log-group-1  --filter-pattern '"ClinicID=7667"' --filter-pattern '" username=simran+test@abc.com"' --start "5h ago" --timestamp

References:参考:

awslogs日志

awslogs .日志。 PyPI派皮

The easiest way is to use awswrangler:最简单的方法是使用 awswrangler:

import awswrangler as wr

# must define this for wrangler to work
boto3.setup_default_session(region_name=region)

df = wr.cloudwatch.read_logs(
    log_group_names=["loggroup"],
    start_time=from_timestamp,
    end_time=to_timestamp,
    query="fields @timestamp, @message | sort @timestamp desc | limit 5",
)

You can pass a list of the log groups needed, start and end time.您可以传递所需日志组的列表、开始和结束时间。 The output is a pandas DataFrame containing the results. output 是包含结果的 pandas DataFrame。

FYI, under the hood, awswrangler uses the boto3 commands as in @dejan answer仅供参考,在幕后,awswrangler 使用 @dejan 回答中的 boto3 命令

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM