简体   繁体   中英

Split/Slice large JSON sort free Unique by few columns & add additional element using jq

per Split/Slice large JSON using jq we are able to successfully slice huge input file into smaller chunk of data based on array size..

Would like to add a new json element to it with incrementing sequence number based on length of original array along with filter/unique per few columns.

Input:

{"recDt":"2021-01-05",
 "country":"US",
 "name":"ABC",
 "number":"9828",
 "add": [
     {"evnCd":"O","rngNum":"1","state":"TX","city":"ANDERSON","postal":"77830"},
     {"evnCd":"O","rngNum":"2","state":"TX","city":"ANDERSON","postal":"77830"},
     {"evnCd":"O","rngNum":"3","state":"TX","city":"ANDERSON","postal":"77831"},
     {"evnCd":"O","rngNum":"4","state":"TX","city":"ANDERSON","postal":"77832"}
 ]
}

Expected Output: After performing adding of additional key

{"recDt":"2021-01-05",
 "country":"US",
 "name":"ABC",
 "number":"9828",
 "add": [
     {"rownum":1,"evnCd":"O","rngNum":"1","state":"TX","city":"ANDERSON","postal":"77830"},
     {"rownum":2,"evnCd":"O","rngNum":"2","state":"TX","city":"ANDERSON","postal":"77830"},
     {"rownum":3,"evnCd":"O","rngNum":"3","state":"TX","city":"ANDERSON","postal":"77831"},
     {"rownum":4,"evnCd":"O","rngNum":"4","state":"TX","city":"ANDERSON","postal":"77832"}
 ]
}

After performing filter (by State, City, Postal) and slice per array size of 2

{"recDt":"2021-01-05",
 "country":"US",
 "name":"ABC",
 "number":"9828",
 "add": [
     {"rownum":1,"evnCd":"O","rngNum":"1","state":"TX","city":"ANDERSON","postal":"77830"},
     {"rownum":3,"evnCd":"O","rngNum":"3","state":"TX","city":"ANDERSON","postal":"77831"}]}

{"recDt":"2021-01-05",
 "country":"US",
 "name":"ABC",
 "number":"9828",
 "add": [
     {"rownum":4,"evnCd":"O","rngNum":"4","state":"TX","city":"ANDERSON","postal":"77832"}
 ]
}

Below sample was used to filer/unique by few columns, not attaining optimal performance

input.json jq -r --argjson size 2 ' .add |= unique_by({city,state,postal}) | del(.add) as $object | (.add|_nwise($size) | ("\t", $object + {add:.} )) ' | awk ' /^\t/ {fn++; next} { print >> "part-" fn ".json"}'

One could use

.add |= [ range(length) as $i | .[$i] | .rownum = $i+1 ]

Demo on jqplay

or

.add |= ( to_entries | map( .value.rownum = .key+1 | .value ) )

Demo on jqplay

Here's a solution that uses two general-purpose filters - one for enumerating, and the second for a sort-free and stream-oriented variant of unique_by :

  # counting from 1
  def enumerate(s; $key): foreach s as $x (0; .+1; {($key): .} + $x);

  # emits a stream of the first item, $x, in the stream for which f assumes the value ($x|f).
  def uniques_by(stream; f): 
    reduce stream as $x ({};
      ($x|f) as $s
      | ($s|type) as $t
      | (if $t == "string" then $s else ($s|tojson) end) as $y
      | if .[$t] | has($y) then . else .[$t][$y] = $x end )
    | .[][] ;

  .add |= [enumerate(uniques_by(.[]; {city,state,postal}); "rownum")]
  | del(.add) as $object
  | (.add|_nwise($size) | ("\t", $object + {add:.} ))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM