繁体   English   中英

Boto3 CloudFront Object 使用次数

[英]Boto3 CloudFront Object Usage Count

希望计算我的 CloudFront dist 中所有对象被单独命中的次数,以便我可以生成一个 excel 表来跟踪使用情况统计信息。 我一直在查看 CloudFront 的 boto3 文档,但无法确定可以访问该信息的位置。 我看到 AWS Cloudfront 控制台生成了“流行对象”报告。 我不确定是否有人知道如何获取 AWS 在 boto3 中为该报告生成的数字?

如果它不能通过 Boto3 访问,我应该使用 AWS CLI 命令吗?

更新:

这是我最终用作伪代码的内容,希望它是其他人的起点:

import boto3
import gzip
from datetime import datetime, date, timedelta
import shutil
from xlwt import Workbook

def analyze(timeInterval):
    """
    analyze usage data in cloudfront
    :param domain:
    :param id:
    :param password:
    :return: usage data
    """
    outputList = []
    outputDict = {}

    s3 = boto3.resource('s3', aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=PASSWORD)
    data = s3.Bucket(AWS_STORAGE_BUCKET_NAME)
    count = 0
    currentDatetime = str(datetime.now()).split(' ')
    currentDatetime = currentDatetime[0].split('-')
    currentdatetimeYear = int(currentDatetime[0])
    currentdatetimeMonth = int(currentDatetime[1])
    currentdatetimeDay = int(currentDatetime[2])
    currentDatetime = date(year=currentdatetimeYear, month=currentdatetimeMonth, day=currentdatetimeDay)

    # create excel workbook/sheet that we'll save results to
    wb = Workbook()
    sheet1 = wb.add_sheet('Log Results By URL')
    sheet1.write(0, 1, 'File')
    sheet1.write(0, 2, 'Total Hit Count')
    sheet1.write(0, 3, 'Total Byte Count')

    for item in data.objects.all():
        count += 1
        # print(count, '\n', item)
        # print(item.key)
        datetimeRef = str(item.key).replace(CLOUDFRONT_IDENTIFIER+'.', '')
        datetimeRef = datetimeRef.split('.')
        datetimeRef = datetimeRef[0]
        datetimeRef = str(datetimeRef[:-3]).split('-')
        datetimeRefYear = int(datetimeRef[0])
        datetimeRefMonth = int(datetimeRef[1])
        datetimeRefDay = int(datetimeRef[2])
        datetimeRef = date(year=datetimeRefYear, month=datetimeRefMonth, day=datetimeRefDay)
        # print('comparing', datetimeRef - timedelta(days=1), currentDatetime)
        if timeInterval == 'daily':
            if datetimeRef > currentDatetime - timedelta(days=1):
                pass
            else:
                # file not within datetime restrictions, don't do stuff
                continue
        elif timeInterval == 'weekly':
            if datetimeRef > currentDatetime - timedelta(days=7):
                pass
            else:
                # file not within datetime restrictions, don't do stuff
                continue
        elif timeInterval == 'monthly':
            if datetimeRef > currentDatetime - timedelta(weeks=4):
                pass
            else:
                # file not within datetime restrictions, don't do stuff
                continue
        elif timeInterval == 'yearly':
            if datetimeRef > currentDatetime - timedelta(weeks=52):
                pass
            else:
                # file not within datetime restrictions, don't do stuff
                continue
        print('datetimeRef', datetimeRef)
        print('currentDatetime', currentDatetime)
        print('Analyzing File:', item.key)

        # download the file
        s3.Bucket(AWS_STORAGE_BUCKET_NAME).download_file(item.key, 'logFile.gz')

        # unzip the file
        with gzip.open('logFile.gz', 'rb') as f_in:
            with open('logFile.txt', 'wb') as f_out:
                shutil.copyfileobj(f_in, f_out)

        # read the text file and add contents to a list
        with open('logFile.txt', 'r') as f:
            lines = f.readlines()
            localcount = -1
            for line in lines:
                localcount += 1
                if localcount < 2:
                    continue
                else:
                    outputList.append(line)

        # print(outputList)
        # iterate through the data collecting hit counts and byte size
        for dataline in outputList:
            data = dataline.split('\t')
            # print(data)
            if outputDict.get(data[7]) is None:
                outputDict[data[7]] = {'count': 1, 'byteCount': int(data[3])}
            else:
                td = outputDict[data[7]]
                outputDict[data[7]] = {'count': int(td['count']) + 1, 'byteCount': int(td['byteCount']) + int(data[3])}

    # print(outputDict)
    #  iterate through the result dictionary and write to the excel sheet
    outputDictKeys = outputDict.keys()
    count = 1
    for outputDictKey in outputDictKeys:
        sheet1.write(count, 1, str(outputDictKey))
        sheet1.write(count, 2, outputDict[outputDictKey]['count'])
        sheet1.write(count, 3, outputDict[outputDictKey]['byteCount'])
        count += 1
    safeDateTime = str(datetime.now()).replace(':', '.')

    # save the workbook
    wb.save(str(timeInterval)+str('_Log_Result_'+str(safeDateTime)) + '.xls')


if __name__ == '__main__':
    analyze('daily')

配置和使用标准日志(访问日志) - Amazon CloudFront

您可以将 CloudFront 配置为创建日志文件,其中包含有关 CloudFront 收到的每个用户请求的详细信息。 这些称为标准日志,也称为访问日志。 这些标准日志可用于 web 和 RTMP 发行版。 如果您启用标准日志,您还可以指定您希望 CloudFront 在其中保存文件的 Amazon S3 存储桶。

日志文件可能非常大,但您可以使用 Amazon Athena 查询 Amazon CloudFront 日志

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM