简体   繁体   English

Boto3 CloudFront Object 使用次数

[英]Boto3 CloudFront Object Usage Count

Looking to count the number of times all of the objects in my CloudFront dist has been hit individually so that I can generate an excel sheet to track usage stats.希望计算我的 CloudFront dist 中所有对象被单独命中的次数,以便我可以生成一个 excel 表来跟踪使用情况统计信息。 I've been looking through the boto3 docs for CloudFront and I haven't been able to peg down where that information could be accessed.我一直在查看 CloudFront 的 boto3 文档,但无法确定可以访问该信息的位置。 I see that AWS Cloudfront console generates a 'Popular Objects' report.我看到 AWS Cloudfront 控制台生成了“流行对象”报告。 I wasn't sure if anyone knew how to get the numbers that AWS generates for that report in boto3?我不确定是否有人知道如何获取 AWS 在 boto3 中为该报告生成的数字?

If it's not accessible through Boto3, would there be an AWS CLI command that I should use instead?如果它不能通过 Boto3 访问,我应该使用 AWS CLI 命令吗?

UPDATE:更新:

Here's what I ended up using as pseudo-code, hopefully it's a starting point for someone else:这是我最终用作伪代码的内容,希望它是其他人的起点:

import boto3
import gzip
from datetime import datetime, date, timedelta
import shutil
from xlwt import Workbook

def analyze(timeInterval):
    """
    analyze usage data in cloudfront
    :param domain:
    :param id:
    :param password:
    :return: usage data
    """
    outputList = []
    outputDict = {}

    s3 = boto3.resource('s3', aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=PASSWORD)
    data = s3.Bucket(AWS_STORAGE_BUCKET_NAME)
    count = 0
    currentDatetime = str(datetime.now()).split(' ')
    currentDatetime = currentDatetime[0].split('-')
    currentdatetimeYear = int(currentDatetime[0])
    currentdatetimeMonth = int(currentDatetime[1])
    currentdatetimeDay = int(currentDatetime[2])
    currentDatetime = date(year=currentdatetimeYear, month=currentdatetimeMonth, day=currentdatetimeDay)

    # create excel workbook/sheet that we'll save results to
    wb = Workbook()
    sheet1 = wb.add_sheet('Log Results By URL')
    sheet1.write(0, 1, 'File')
    sheet1.write(0, 2, 'Total Hit Count')
    sheet1.write(0, 3, 'Total Byte Count')

    for item in data.objects.all():
        count += 1
        # print(count, '\n', item)
        # print(item.key)
        datetimeRef = str(item.key).replace(CLOUDFRONT_IDENTIFIER+'.', '')
        datetimeRef = datetimeRef.split('.')
        datetimeRef = datetimeRef[0]
        datetimeRef = str(datetimeRef[:-3]).split('-')
        datetimeRefYear = int(datetimeRef[0])
        datetimeRefMonth = int(datetimeRef[1])
        datetimeRefDay = int(datetimeRef[2])
        datetimeRef = date(year=datetimeRefYear, month=datetimeRefMonth, day=datetimeRefDay)
        # print('comparing', datetimeRef - timedelta(days=1), currentDatetime)
        if timeInterval == 'daily':
            if datetimeRef > currentDatetime - timedelta(days=1):
                pass
            else:
                # file not within datetime restrictions, don't do stuff
                continue
        elif timeInterval == 'weekly':
            if datetimeRef > currentDatetime - timedelta(days=7):
                pass
            else:
                # file not within datetime restrictions, don't do stuff
                continue
        elif timeInterval == 'monthly':
            if datetimeRef > currentDatetime - timedelta(weeks=4):
                pass
            else:
                # file not within datetime restrictions, don't do stuff
                continue
        elif timeInterval == 'yearly':
            if datetimeRef > currentDatetime - timedelta(weeks=52):
                pass
            else:
                # file not within datetime restrictions, don't do stuff
                continue
        print('datetimeRef', datetimeRef)
        print('currentDatetime', currentDatetime)
        print('Analyzing File:', item.key)

        # download the file
        s3.Bucket(AWS_STORAGE_BUCKET_NAME).download_file(item.key, 'logFile.gz')

        # unzip the file
        with gzip.open('logFile.gz', 'rb') as f_in:
            with open('logFile.txt', 'wb') as f_out:
                shutil.copyfileobj(f_in, f_out)

        # read the text file and add contents to a list
        with open('logFile.txt', 'r') as f:
            lines = f.readlines()
            localcount = -1
            for line in lines:
                localcount += 1
                if localcount < 2:
                    continue
                else:
                    outputList.append(line)

        # print(outputList)
        # iterate through the data collecting hit counts and byte size
        for dataline in outputList:
            data = dataline.split('\t')
            # print(data)
            if outputDict.get(data[7]) is None:
                outputDict[data[7]] = {'count': 1, 'byteCount': int(data[3])}
            else:
                td = outputDict[data[7]]
                outputDict[data[7]] = {'count': int(td['count']) + 1, 'byteCount': int(td['byteCount']) + int(data[3])}

    # print(outputDict)
    #  iterate through the result dictionary and write to the excel sheet
    outputDictKeys = outputDict.keys()
    count = 1
    for outputDictKey in outputDictKeys:
        sheet1.write(count, 1, str(outputDictKey))
        sheet1.write(count, 2, outputDict[outputDictKey]['count'])
        sheet1.write(count, 3, outputDict[outputDictKey]['byteCount'])
        count += 1
    safeDateTime = str(datetime.now()).replace(':', '.')

    # save the workbook
    wb.save(str(timeInterval)+str('_Log_Result_'+str(safeDateTime)) + '.xls')


if __name__ == '__main__':
    analyze('daily')

From Configuring and Using Standard Logs (Access Logs) - Amazon CloudFront :配置和使用标准日志(访问日志) - Amazon CloudFront

You can configure CloudFront to create log files that contain detailed information about every user request that CloudFront receives.您可以将 CloudFront 配置为创建日志文件,其中包含有关 CloudFront 收到的每个用户请求的详细信息。 These are called standard logs, also known as access logs.这些称为标准日志,也称为访问日志。 These standard logs are available for both web and RTMP distributions.这些标准日志可用于 web 和 RTMP 发行版。 If you enable standard logs, you can also specify the Amazon S3 bucket that you want CloudFront to save files in.如果您启用标准日志,您还可以指定您希望 CloudFront 在其中保存文件的 Amazon S3 存储桶。

The log files can be quite large, but you can Query Amazon CloudFront Logs using Amazon Athena .日志文件可能非常大,但您可以使用 Amazon Athena 查询 Amazon CloudFront 日志

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM