[英]how to get the intersection of two JSON arrays using jq
Given arrays X and Y (preferably both as inputs, but otherwise, with one as input and the other hardcoded), how can I use jq to output the array containing all elements common to both?给定数组 X 和 Y(最好都作为输入,否则,一个作为输入,另一个硬编码),如何使用 jq 输出包含两者共有的所有元素的数组? eg what is a value of f such that
例如,f 的值是多少,使得
echo '[1,2,3,4]' | jq 'f([2,4,6,8,10])'
would output会输出
[2,4]
? ?
I've tried the following:我尝试了以下方法:
map(select(in([2,4,6,8,10]))) --> outputs [1,2,3,4]
select(map(in([2,4,6,8,10]))) --> outputs [1,2,3,4,5]
A simple and quite fast (but somewhat naive) filter that probably does essentially what you want can be defined as follows:一个简单且相当快速(但有点幼稚)的过滤器可能基本上可以满足您的需求,可以定义如下:
# x and y are arrays
def intersection(x;y):
( (x|unique) + (y|unique) | sort) as $sorted
| reduce range(1; $sorted|length) as $i
([]; if $sorted[$i] == $sorted[$i-1] then . + [$sorted[$i]] else . end) ;
If x is provided as input on STDIN, and y is provided in some other way (eg def y: ...
), then you could use this as: intersection(.;y)
如果 x 在 STDIN 上作为输入提供,而 y 以其他方式提供(例如
def y: ...
),那么您可以将其用作: intersection(.;y)
Other ways to provide two distinct arrays as input include:提供两个不同数组作为输入的其他方法包括:
--slurp
option--slurp
选项--arg av
(or --argjson av
if available in your jq)--arg av
(或--argjson av
如果在您的 jq 中可用) Here's a simpler but slower def that's nevertheless quite fast in practice:这是一个更简单但速度较慢的定义,但在实践中却相当快:
def i(x;y):
if (y|length) == 0 then []
else (x|unique) as $x
| $x - ($x - y)
end ;
Here's a standalone filter for finding the intersection of arbitrarily many arrays:这是一个用于查找任意多个数组的交集的独立过滤器:
# Input: an array of arrays
def intersection:
def i(y): ((unique + (y|unique)) | sort) as $sorted
| reduce range(1; $sorted|length) as $i
([]; if $sorted[$i] == $sorted[$i-1] then . + [$sorted[$i]] else . end) ;
reduce .[1:][] as $a (.[0]; i($a)) ;
Examples:例子:
[ [1,2,4], [2,4,5], [4,5,6]] #=> [4]
[[]] #=> []
[] #=> null
Of course if x
and y
are already known to be sorted and/or unique, more efficient solutions are possible.当然,如果已知
x
和y
已排序和/或唯一,则可能有更有效的解决方案。 See in particular Finite Sets of JSON Entities特别参见JSON 实体的有限集
These complexity of all these answers obscured understanding the principle.所有这些答案的复杂性掩盖了对原理的理解。 That's unfortunate because the principle is simple:
这很不幸,因为原理很简单:
- array1 minus array2 returns:
array1 减去 array2 返回:
- everything that's left in array1
array1 中剩下的所有内容
- after removing everything that is in array2
删除 array2 中的所有内容后
- (and discarding the rest of array2)
(并丢弃 array2 的其余部分)
# From array1, subtract array2, leaving the remainder
$ jq --null-input '[1,2,3,4] - [2,4,6,8]'
[
1,
3
]
# Subtract the remainder from the original
$ jq --null-input '[1,2,3,4] - [1,3]'
[
2,
4
]
# Put it all together
$ jq --null-input '[1,2,3,4] - ([1,2,3,4] - [2,4,6,8])'
[
2,
4
]
comm
Demo comm
演示def comm:
(.[0] - (.[0] - .[1])) as $d |
[.[0]-$d, .[1]-$d, $d]
;
With that understanding, I was able to imitate the behavior of the *nix comm
command有了这种理解,我就能够模仿*nix
comm
命令的行为
With no options, produce three-column output.
在没有选项的情况下,生成三列输出。 Column one contains lines unique to FILE1, column two contains lines unique to FILE2, and column three contains lines common to both files.
第一列包含 FILE1 独有的行,第二列包含 FILE2 独有的行,第三列包含两个文件共有的行。
$ echo 'def comm: (.[0]-(.[0]-.[1])) as $d | [.[0]-$d,.[1]-$d, $d];' > comm.jq
$ echo '{"a":101, "b":102, "c":103, "d":104}' > 1.json
$ echo '{ "b":202, "d":204, "f":206, "h":208}' > 2.json
$ jq --slurp '.' 1.json 2.json
[
{
"a": 101,
"b": 102,
"c": 103,
"d": 104
},
{
"b": 202,
"d": 204,
"f": 206,
"h": 208
}
]
$ jq --slurp '[.[] | keys | sort]' 1.json 2.json
[
[
"a",
"b",
"c",
"d"
],
[
"b",
"d",
"f",
"h"
]
]
$ jq --slurp 'include "comm"; [.[] | keys | sort] | comm' 1.json 2.json
[
[
"a",
"c"
],
[
"f",
"h"
],
[
"b",
"d"
]
]
$ jq --slurp 'include "comm"; [.[] | keys | sort] | comm[2]' 1.json 2.json
[
"b",
"d"
]
Here is a solution which works by counting occurrences of elements in the arrays using foreach这是一个解决方案,它通过使用foreach计算数组中元素的出现次数来工作
[
foreach ($X[], $Y[]) as $r (
{}
; .[$r|tostring] += 1
; if .[$r|tostring] == 2 then $r else empty end
)
]
If this filter is in filter.jq
then如果此过滤器在
filter.jq
则
jq -M -n -c --argjson X '[1,2,3,4]' --argjson Y '[2,4,6,8,10]' -f filter.jq
will produce会产生
[2,4]
It assumes there are no duplicates in the initial arrays.它假设初始数组中没有重复项。 If that's not the case then it is easy to compensate with unique .
如果不是这种情况,那么很容易用unique进行补偿。 Eg
例如
[
foreach (($X|unique)[], ($Y|unique)[]) as $r (
{}
; .[$r|tostring] += 1
; if .[$r|tostring] == 2 then $r else empty end
)
]
$ echo '[1,2,3,4] [2,4,6,8,10]' | jq --slurp '[.[0][] as $x | .[1][] | select($x == .)]'
[
2,
4
]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.