(jq newbie here, sorry if this question has an obvious answer:) )
I'd like to filter json based on whether a value is not in a list.
Here's a concrete example:
Input
[
{
"n": "A",
"a": 659533330984,
"vals": {
"n2": "B",
"b": 5193941030
}
},
{
"n": "A",
"a": 659533330984,
"vals": {
"n2": "C",
"b": 4872891707
}
},
{
"n": "B",
"a": 659533330984,
"vals": {
"n2": "C",
"b": 4872891707
}
}
]
Filter
[.n, .vals.n2]
not in (["A", "B"], ["B", "C"])
Hence, in jq, I tried the following commands (also based on this related question )
jq '[ .[] | select([.n, .vals.n2] as $i | (["A", "B"], ["B", "C"]) | index($i) | not )]'
and
jq '[ .[] | select([.n, .vals.n2] != (["A", "B"], ["B", "C"]))]'
However, the both commands give the output
[
{
"n": "A",
"a": 659533330984,
"vals": {
"n2": "B",
"b": 5193941030
}
},
{
"n": "A",
"a": 659533330984,
"vals": {
"n2": "C",
"b": 4872891707
}
},
{
"n": "A",
"a": 659533330984,
"vals": {
"n2": "C",
"b": 4872891707
}
},
{
"n": "B",
"a": 659533330984,
"vals": {
"n2": "C",
"b": 4872891707
}
}
]
whereas this would be the desired output -- without duplicates and with logical AND of all "blacklisted" values:
[
{
"n": "A",
"a": 659533330984,
"vals": {
"n2": "C",
"b": 4872891707
}
}
]
It makes sense that the second command does not work, since if I understood correctly, the comma operator basically means that jq evaluates the expression once for every listed element - hence the duplicates. However simply piping through unique
does not help since the output should not contain any of the filter pairs.
The only other idea I have at the moment is to pipe select through select through select... for each item in the "blacklist". However, I'd like to read the blacklist as an input -- I could dynamically create the command, but I was wondering whether there is a more beautiful solution? It feels like as if there must be...
I'd be very happy to hear your input on how to approach this best.
I'm using jq version jq-1.5-1-a5b5cbe.
The expression:
([.n, .vals.n2]) not in (["A", "B"], ["B", "C"])
would be equivalent to:
([.n, .vals.n2]) != ["A", "B"] and ([.n, .vals.n2]) != ["B", "C"]
As you have it here:
select([.n, .vals.n2] != (["A", "B"], ["B", "C"]))
it's not quite the same as the comma effectively makes it an or
.
You'll need to do something more like this:
select([.n, .vals.n2] as $v | $v != ["A", "B"] and $v != ["B", "C"])
or
select([.n, .vals.n2] as $v | all(["A", "B"], ["B", "C"]; $v != .))
Also if you wanted to stick with your first approach, you would have to put the values in an array and not just separated by a comma.
select([.n, .vals.n2] as $i | [["A", "B"], ["B", "C"]] | index($i) | not)
When using index
to find the index of an array (say $x), you have to write:
index([$x])
(This has to do with the fact that index
is designed to work in a uniform way on both JSON strings and arrays.)
[["A", "B"], ["B", "C"]] as $blacklist
| map( [.n, .vals.n2] as $i
| select( $blacklist | index([$i]) | not) )
: Given an array, A, containing an item, X, how can I find the least index of X in A? Why does [ 1 ] | index( 1 ) return null rather than 0? Why does [1,2] | index([1,2]) return 0 rather than null?
A: The simplest uniform method for finding the least index of X in an array is to query for [X] rather than X itself, that is: index([X]).
By contrast, the filter index([1,2]) attempts to find [1,2] as a subsequence of contiguous items in the input array. This is for uniformity with the behavior of t | index(s) where s and t are strings.
If X is not an array, then index([X]) may be abbreviated to index(X).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.