简体   繁体   English

在 python3 中使用 parsedmarc 创建 dmarc 解析器以在 AWS s3 中使用

[英]Creating a dmarc parser using parsedmarc in python3 for use in AWS s3

I am very new to programming.我对编程很陌生。 I am working on a pipeline to analyze DMARC report files that are sent to my email account, that I am manually placing in an s3 bucket.我正在开发一个管道来分析发送到我的 email 帐户的 DMARC 报告文件,我手动将其放入 s3 存储桶中。 The goal of this task is to download, extract, and analyze files using parsedmarc: https://github.com/domainaware/parsedmarc The part I'm having difficulty with is setting a conditional statement to extract.gz files if the target file is not a.zip file.此任务的目标是使用 parsedmarc 下载、提取和分析文件: https://github.com/domainaware/parsedmarc我遇到困难的部分是设置条件语句以提取.gz 文件,如果目标文件不是.zip 文件。 I'm assuming the gzip library will be sufficient for this purpose.我假设gzip库足以满足此目的。 Here is the code I have so far.这是我到目前为止的代码。 I'm using python3 and the boto3 library for AWS.我正在为 AWS 使用 python3 和 boto3 库。 Any help is appreciated!任何帮助表示赞赏!

import parsedmarc    
import pprint
import json
import boto3
import zipfile
import gzip

pp = pprint.PrettyPrinter(indent=2)

def main():
    #Set default session profile and region for sandbox account. Access keys are pulled from /.aws/config and /.aws/credentials.
    #The 'profile_name' value comes from the header for the account in question in /.aws/config and /.aws/credentials
    boto3.setup_default_session(region_name="aws-region-goes-here")
    boto3.setup_default_session(profile_name="aws-account-profile-name-goes-here")

    #Define the s3 resource, the bucket name, and the file to download. It's hardcoded for now...
    s3_resource = boto3.resource(s3)
    s3_resource.Bucket('dmarc-parsing').download_file('source-dmarc-report-filename.zip' '/home/user/dmarc/parseme.zip')

    #Use the zipfile python library to extract the file into its raw state.
    with zipfile.ZipFile('/home/user/dmarc/parseme.zip', 'r') as zip_ref:
        zip_ref.extractall('/home/user/dmarc')

    #Ingest all locations for xml file source
    dmarc_report_directory = '/home/user/dmarc/'
    dmarc_report_file = 'parseme.xml'

    """I need an if statement here for extracting .gz files if the file type is not .zip. The contents of every archive are .xml files"""

    #Set report output variables using functions in parsedmarc. Variable set to equal the output
    pd_report_output=parsedmarc.parse_aggregate_report_file(_input=f"{dmarc_report_directory}{dmarc_report_file}")
    #use jsonify to make the output in json format
    pd_report_jsonified = json.loads(json.dumps(pd_report_output))

    dkim_status = pd_report_jsonified['records'][0]['policy_evaluated']['dkim']
    spf_status = pd_report_jsonified['records'][0]['policy_evaluated']['spf']

    if dkim_status == 'fail' or spf_status == 'fail':
        print(f"{dmarc_report_file} reports failure. oh crap. report:")
    else:
        print(f"{dmarc_report_file} passes. great. report:")

    pp.pprint(pd_report_jsonified['records'][0]['auth_results'])


if __name__ == "__main__":
    main()

Here is the code using the parsedmarc.parse_aggregate_report_xml method I found.这是使用我找到的parsedmarc.parse_aggregate_report_xml方法的代码。 Hope this helps others in parsing these reports:希望这有助于其他人解析这些报告:

import parsedmarc
import pprint
import json
import boto3
import zipfile
import gzip

pp = pprint.PrettyPrinter(indent=2)

def main():

    #Set default session profile and region for account. Access keys are pulled from ~/.aws/config and ~/.aws/credentials.
    #The 'profile_name' value comes from the header for the account in question in ~/.aws/config and ~/.aws/credentials
    boto3.setup_default_session(profile_name="aws_profile_name_goes_here", region_name="region_goes_here")

    source_file = 'filename_in_s3_bucket.zip'
    destination_directory = '/tmp/'
    destination_file = 'compressed_report_file'

    #Define the s3 resource, the bucket name, and the file to download. It's hardcoded for now...
    s3_resource = boto3.resource('s3')
    s3_resource.Bucket('bucket-name-for-dmarc-report-files').download_file(source_file, f"{destination_directory}{destination_file}")

    #Extract xml
    outputxml = parsedmarc.extract_xml(f"{destination_directory}{destination_file}")

    #run parse dmarc analysis & convert output to json
    pd_report_output = parsedmarc.parse_aggregate_report_xml(outputxml)
    pd_report_jsonified = json.loads(json.dumps(pd_report_output))

    #loop through results and find relevant status info and pass fail status
    dmarc_report_status = ''
    for record in pd_report_jsonified['records']:
        if False in record['alignment'].values():
            dmarc_report_status = 'Failed'
            #************ add logic for interpreting results

    #if fail, publish to sns
    if dmarc_report_status == 'Failed':

        message = "Your dmarc report failed a least one check. Review the log for details"

        sns_resource = boto3.resource('sns')
        sns_topic = sns_resource.Topic('arn:aws:sns:us-west-2:112896196555:TestDMARC')
        sns_publish_response = sns_topic.publish(Message=message)


if __name__ == "__main__":
    main()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM