繁体   English   中英

谷歌云 DLP API:如何在检查谷歌云存储文件时获得完整的 dlp 作业检查结果

[英]Google cloud DLP API: How to get full dlp job inspection results when inspecting google cloud storage files

我正在从谷歌云存储运行 dlp 作业检查,我想知道是否有一种方法或方法可以获得完整的检查结果而不是摘要,就像检查外部文件一样? 这是我在扫描外部和本地文件时如何获得检查结果的代码片段:

# Print out the results.
    results = []
    if response.result.findings:
        for finding in response.result.findings:
            finding_dict = {
                "quote": finding.quote if "quote" in finding else None,
                "info_type": finding.info_type.name,
                "likelihood": finding.likelihood.name,
                "location_start": finding.location.byte_range.start,
                "location_end": finding.location.byte_range.end
            }
            results.append(finding_dict)
    else:
        print("No findings.")

output 看起来像这样:

{
    "quote": "gitlab.com",
     "info_type": "DOMAIN_NAME",
     "likelihood": "LIKELY",
     "location_start": 3015,
     "location_end": 3025
},
   {
     "quote": "www.makeareadme.com",
     "info_type": "DOMAIN_NAME",
     "likelihood": "LIKELY",
     "location_start": 3107,
     "location_end": 3126
    }

但是当使用带有 pub/sub 的 dlp_get_job 方法扫描谷歌云存储项目时:

    def callback(message):
        try:
            if message.attributes["DlpJobName"] == operation.name:
                # This is the message we're looking for, so acknowledge it.
                message.ack()

                # Now that the job is done, fetch the results and print them.
                job = dlp_client.get_dlp_job(request={"name": operation.name})
                if job.inspect_details.result.info_type_stats:
                    for finding in job.inspect_details.result.info_type_stats:
                        print(
                            "Info type: {}; Count: {}".format(
                                finding.info_type.name, finding.count
                            )
                        )
                else:
                    print("No findings.")

                # Signal to the main thread that we can exit.
                job_done.set()
            else:
                # This is not the message we're looking for.
                message.drop()
        except Exception as e:
            # Because this is executing in a thread, an exception won't be
            # noted unless we print it manually.
            print(e)
            raise

结果采用以下摘要格式:

Info type: LOCATION; Count: 18
Info type: DATE; Count: 12
Info type: LAST_NAME; Count: 4
Info type: DOMAIN_NAME; Count: 170
Info type: URL; Count: 20
Info type: FIRST_NAME; Count: 7

有没有办法在谷歌云存储上扫描文件时获得详细的检查结果,在那里我会得到报价、信息类型、可能性等……而不被总结? 我尝试了几种方法并通读了几乎所有文档,但我没有找到任何可以提供帮助的方法。 我正在使用 dlp python 客户端 api 在 windows 环境中运行检查作业。我将不胜感激任何人对此的帮助;)

是的,你可以做到这一点。 由于详细检查结果可能很敏感,因此不会保留在作业详细信息/摘要中,但您可以配置作业“操作”以将详细结果写入您拥有/控制的 BigQuery 表。 通过这种方式,您可以访问每个发现的详细信息(文件或表路径、列名、字节偏移量、可选引号等)。

API 的详细信息在这里: https://cloud.google.com/dlp/docs/reference/rest/v2/Action#SaveFindings

以下是有关如何查询详细结果的更多文档:

还有关于 DLP 作业操作的更多详细信息: https://cloud.google.com/dlp/docs/concepts-actions

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM