I have a ton of records (~4,500) that I've processed (using jq) down to a sequence of JSON grouped by hourly UTC time (~680 groups, all unique).
{
"2018-10-09T19:00:00.000Z": []
}
{
"2018-10-09T20:00:00.000Z": []
}
{
"2018-10-09T21:00:00.000Z": []
}
I'm pretty sure you can see where this is going, but I want to combine all these into a single JSON object to hand over to another system for more fun.
{
"2018-10-09T19:00:00.000Z": [],
"2018-10-09T20:00:00.000Z": [],
"2018-10-09T21:00:00.000Z": []
}
The last two things I'm doing before I get to the sequence of objects is:
group_by(.day)[] | { (.[0].day): . }
Where .day
is the ISO Date you see referenced above.
I've tried a few things around map
and reduce
functions, but can't seem to massage the data the way I want. I've spent a few hours on this and need to take a break, so any help or direction you can point me would be great!
If everything is already in memory, you could modify the group_by
line as follows:
reduce group_by(.day)[] as $in ({}; . + { ($in[0].day): $in }
group_by
Since group_by
entails a sort, it may be unnecessarily inefficient. You might like to consider using a variant such as the following:
# sort-free variant of group_by/1
# f must always evaluate to an integer or always to a string.
# Output: an array in the former case, or an object in the latter case
def GROUP_BY(f): reduce .[] as $x ({}; .[$x|f] += [$x] );
If the stream of objects is already in a file, use inputs
with the -n command-line option.
This will avoid the overhead of "slurping" but will still require enough RAM for the entire result to fit into memory. If that doesn't work for you, then you will have to resort to desperate measures :-)
This might be a useful starting point:
jq -n 'reduce inputs as $in ({}; . + $in)'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.