简体   繁体   中英

Aggregating JSON results with JQ and passing argument to a python code

I have to aggregate a few JSON results from a site. Because the site has a query concurrency limit and the queries timeout, the time frame for the queries have to be divided. So I am left with a JSON as follows:

{
      "results": [
          [
              {
                  "field": "AccountId",
                  "value": "11352"
              },
              {
                  "field": "number_of_requests",
                  "value": "241398"
              }
          ],
          [
              {
                  "field": "AccountId",
                  "value": "74923"
              },
              {
                  "field": "number_of_requests",
                  "value": "238566"
              }
          ]
          ],
"statistics": {
          "recordsMatched": 502870.0,
          "recordsScanned": 165908292.0,
          "bytesScanned": 744173091162.0
      },
      "status": "Complete"
}
{
      "results": [
          [
              {
                  "field": "AccountId",
                  "value": "11352"
              },
              {
                  "field": "number_of_requests",
                  "value": "185096"
              }
           ]
          ],
"statistics": {
          "recordsMatched": 502870.0,
          "recordsScanned": 165908292.0,
          "bytesScanned": 744173091162.0
      },
      "status": "Complete"
  }

My objective is to aggregate the results as well as feed account ID as an argument to a python code that prints a string output. The python code runs as follows:

python check_account.py 11352

Internal

python check_account.py 74923

External

So my desired output to be added to the file is:

AccountID: Number of Requests: Type

11352: 426494: Internal

74923: 238566: External

For now, I have a python code, with an API that calls the relevant functions and prints Internal or External as the output.

And the following script in bash using jq:

#!/bin/zsh

ResultsDir=$1

list=$(jq -nr '
[inputs | .results[] | map( { (.field) : .value} ) | add]
| group_by(.AccountId)
| map([.[0].AccountId, (map(.number_of_requests|tonumber) | add)])
| sort_by(.[1]) | reverse
| "\(.[]) " ' $ResultsDir)
echo "Results saved in file query-results"
echo "ACCOUNT ID : #_OF_REQUESTS" > $ResultsDir/query-results
echo "$list" >> $ResultsDir/query-results

There is a way of doing the above in the python code itself but I was wondering if there was a way to leverage the python script above as the python code is leveraged in multiple other functions.

If I understand the question correctly, you want to call the python script check_account.py from the shell script that you have written. This is intended because check_account.py is getting used at multiple places and you do not want to change its functionality.

The jq command that you have written in the above script is able to get the AccountId and aggregated value of number_of_requests. Now, you want to call check_account.py with AccountId as a parameter. The script below is the implementation of the same.

#!/bin/bash

ResultsDir=$1
accountIds=`cat input.js | jq -nr '[inputs | .results[] | map( { (.field) : .value} ) | add] | group_by(.AccountId)| map([.[0].AccountId, (map(.number_of_requests|tonumber) | add)])| sort_by(.[1]) | reverse| "\(.[]) "' | jq -r .[0]`
echo $accountIds
numberOfRequests=`cat input.js | jq -nr '[inputs | .results[] | map( { (.field) : .value} ) | add] | group_by(.AccountId)| map([.[0].AccountId, (map(.number_of_requests|tonumber) | add)])| sort_by(.[1]) | reverse| "\(.[]) "' | jq -r .[1]`
echo "Results saved in file query-results"
echo "AccountID : Number of Requests : Type" > $ResultsDir/query-results
echo "$list"
echo "Will call the check_account.py now"
accountIdsArray=($(echo $accountIds | tr " " "\n")) # to convert into array so that i can loop on the array
numberOfRequestsArray=($(echo $numberOfRequests | tr " " "\n"))
len=${#accountIdsArray[@]} # to find the length of the array
echo $len
  for (( i=0; i<=$((len-1)); i++ )); do 
       value=${accountIdsArray[$i]} 
       echo "index $i - value $value"
       account=$(python check_account.py $value)
       echo $value ${numberOfRequestsArray[$i]} $account >> $ResultsDir/query-results
  done

I do not have the implementation of the check_account.py . So, I wrote the following python script to validate my shell script.

#!/usr/bin/env python
import sys
a = int(sys.argv[1])

def my_function():
    if a == 11352:
        print("Internal")
    if a == 74923:
        print("External")

if __name__ == "__main__":
    my_function()

One key thing to note here is that the shell script by default captures the "Standard Output" (STDOUT) of the Python process. To capture that you simply need to use $(). Inside the parenthesis, you can write your command to call the python script.

By executing the shell script like this

./final-test.sh /Users/ajay/Desktop/

I was able to get a file named query-results. Content of the file query-results is following:

AccountID : Number of Requests : Type
11352 : 426494 : Internal
74923 : 238566 : External

Hope this helps in understanding how to call python script from the shell script and capture it's result in the shell script.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM