简体   繁体   中英

Google cloud DLP API: How to get full dlp job inspection results when inspecting google cloud storage files

I am running dlp job inspections from google cloud storage and i was wondering if there is a method or way to get the full inspection results instead of the summary just the same way as inspecting external files? Here is a code snippet of how i am getting my inspection results when scanning external and local files:

# Print out the results.
    results = []
    if response.result.findings:
        for finding in response.result.findings:
            finding_dict = {
                "quote": finding.quote if "quote" in finding else None,
                "info_type": finding.info_type.name,
                "likelihood": finding.likelihood.name,
                "location_start": finding.location.byte_range.start,
                "location_end": finding.location.byte_range.end
            }
            results.append(finding_dict)
    else:
        print("No findings.")

The output looks like this:

{
    "quote": "gitlab.com",
     "info_type": "DOMAIN_NAME",
     "likelihood": "LIKELY",
     "location_start": 3015,
     "location_end": 3025
},
   {
     "quote": "www.makeareadme.com",
     "info_type": "DOMAIN_NAME",
     "likelihood": "LIKELY",
     "location_start": 3107,
     "location_end": 3126
    }

But when scanning google cloud storage items using the dlp_get_job method with pub/sub this way:

    def callback(message):
        try:
            if message.attributes["DlpJobName"] == operation.name:
                # This is the message we're looking for, so acknowledge it.
                message.ack()

                # Now that the job is done, fetch the results and print them.
                job = dlp_client.get_dlp_job(request={"name": operation.name})
                if job.inspect_details.result.info_type_stats:
                    for finding in job.inspect_details.result.info_type_stats:
                        print(
                            "Info type: {}; Count: {}".format(
                                finding.info_type.name, finding.count
                            )
                        )
                else:
                    print("No findings.")

                # Signal to the main thread that we can exit.
                job_done.set()
            else:
                # This is not the message we're looking for.
                message.drop()
        except Exception as e:
            # Because this is executing in a thread, an exception won't be
            # noted unless we print it manually.
            print(e)
            raise

The results are in this summary format:

Info type: LOCATION; Count: 18
Info type: DATE; Count: 12
Info type: LAST_NAME; Count: 4
Info type: DOMAIN_NAME; Count: 170
Info type: URL; Count: 20
Info type: FIRST_NAME; Count: 7

is there a way to get the detailed inspection results when scanning files on google cloud storage where i will get the quote, info_type, likelihood etc...without being summarized? I have tried a couple of methods and read through almost the docs but i am not finding anything that can help. I am running the inspection job on a windows environment with the dlp python client api. I would appreciate anyone's help with this;)

Yes you can do this. Since the detailed inspection results can be sensitive, those are not kept in the job details/summary, but you can configure a job "action" to write the detailed results to a BigQuery table that you own/control. This way you can get access to the details of every finding (file or table path, column name, byte offset, optional quote, etc.).

The API details for that are here: https://cloud.google.com/dlp/docs/reference/rest/v2/Action#SaveFindings

Below are some more docs on how to query the detailed findings:

Also more details on DLP Job Actions: https://cloud.google.com/dlp/docs/concepts-actions

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM