per Split/Slice large JSON using jq we are able to successfully slice huge input file into smaller chunk of data based on array size..
Would like to add a new json element to it with incrementing sequence number based on length of original array along with filter/unique per few columns.
Input:
{"recDt":"2021-01-05",
"country":"US",
"name":"ABC",
"number":"9828",
"add": [
{"evnCd":"O","rngNum":"1","state":"TX","city":"ANDERSON","postal":"77830"},
{"evnCd":"O","rngNum":"2","state":"TX","city":"ANDERSON","postal":"77830"},
{"evnCd":"O","rngNum":"3","state":"TX","city":"ANDERSON","postal":"77831"},
{"evnCd":"O","rngNum":"4","state":"TX","city":"ANDERSON","postal":"77832"}
]
}
Expected Output: After performing adding of additional key
{"recDt":"2021-01-05",
"country":"US",
"name":"ABC",
"number":"9828",
"add": [
{"rownum":1,"evnCd":"O","rngNum":"1","state":"TX","city":"ANDERSON","postal":"77830"},
{"rownum":2,"evnCd":"O","rngNum":"2","state":"TX","city":"ANDERSON","postal":"77830"},
{"rownum":3,"evnCd":"O","rngNum":"3","state":"TX","city":"ANDERSON","postal":"77831"},
{"rownum":4,"evnCd":"O","rngNum":"4","state":"TX","city":"ANDERSON","postal":"77832"}
]
}
After performing filter (by State, City, Postal) and slice per array size of 2
{"recDt":"2021-01-05",
"country":"US",
"name":"ABC",
"number":"9828",
"add": [
{"rownum":1,"evnCd":"O","rngNum":"1","state":"TX","city":"ANDERSON","postal":"77830"},
{"rownum":3,"evnCd":"O","rngNum":"3","state":"TX","city":"ANDERSON","postal":"77831"}]}
{"recDt":"2021-01-05",
"country":"US",
"name":"ABC",
"number":"9828",
"add": [
{"rownum":4,"evnCd":"O","rngNum":"4","state":"TX","city":"ANDERSON","postal":"77832"}
]
}
Below sample was used to filer/unique by few columns, not attaining optimal performance
input.json jq -r --argjson size 2 ' .add |= unique_by({city,state,postal}) | del(.add) as $object | (.add|_nwise($size) | ("\t", $object + {add:.} )) ' | awk ' /^\t/ {fn++; next} { print >> "part-" fn ".json"}'
Here's a solution that uses two general-purpose filters - one for enumerating, and the second for a sort-free and stream-oriented variant of unique_by
:
# counting from 1
def enumerate(s; $key): foreach s as $x (0; .+1; {($key): .} + $x);
# emits a stream of the first item, $x, in the stream for which f assumes the value ($x|f).
def uniques_by(stream; f):
reduce stream as $x ({};
($x|f) as $s
| ($s|type) as $t
| (if $t == "string" then $s else ($s|tojson) end) as $y
| if .[$t] | has($y) then . else .[$t][$y] = $x end )
| .[][] ;
.add |= [enumerate(uniques_by(.[]; {city,state,postal}); "rownum")]
| del(.add) as $object
| (.add|_nwise($size) | ("\t", $object + {add:.} ))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.