简体   繁体   English

如何使用 jq 获取两个 JSON 数组的交集

[英]how to get the intersection of two JSON arrays using jq

Given arrays X and Y (preferably both as inputs, but otherwise, with one as input and the other hardcoded), how can I use jq to output the array containing all elements common to both?给定数组 X 和 Y(最好都作为输入,否则,一个作为输入,另一个硬编码),如何使用 jq 输出包含两者共有的所有元素的数组? eg what is a value of f such that例如,f 的值是多少,使得

echo '[1,2,3,4]' | jq 'f([2,4,6,8,10])'

would output会输出

[2,4]

? ?

I've tried the following:我尝试了以下方法:

map(select(in([2,4,6,8,10])))  --> outputs [1,2,3,4]
select(map(in([2,4,6,8,10])))  --> outputs [1,2,3,4,5]

A simple and quite fast (but somewhat naive) filter that probably does essentially what you want can be defined as follows:一个简单且相当快速(但有点幼稚)的过滤器可能基本上可以满足您的需求,可以定义如下:

   # x and y are arrays
   def intersection(x;y):
     ( (x|unique) + (y|unique) | sort) as $sorted
     | reduce range(1; $sorted|length) as $i
         ([]; if $sorted[$i] == $sorted[$i-1] then . + [$sorted[$i]] else . end) ;

If x is provided as input on STDIN, and y is provided in some other way (eg def y: ... ), then you could use this as: intersection(.;y)如果 x 在 STDIN 上作为输入提供,而 y 以其他方式提供(例如def y: ... ),那么您可以将其用作: intersection(.;y)

Other ways to provide two distinct arrays as input include:提供两个不同数组作为输入的其他方法包括:

  • using the --slurp option使用--slurp选项
  • using --arg av (or --argjson av if available in your jq)使用--arg av (或--argjson av如果在您的 jq 中可用)

Here's a simpler but slower def that's nevertheless quite fast in practice:这是一个更简单但速度较慢的定义,但在实践中却相当快:

    def i(x;y):
       if (y|length) == 0 then []
       else (x|unique) as $x
       | $x - ($x - y)
       end ;

Here's a standalone filter for finding the intersection of arbitrarily many arrays:这是一个用于查找任意多个数组的交集的独立过滤器:

# Input: an array of arrays
def intersection:
  def i(y): ((unique + (y|unique)) | sort) as $sorted
  | reduce range(1; $sorted|length) as $i
       ([]; if $sorted[$i] == $sorted[$i-1] then . + [$sorted[$i]] else . end) ;
  reduce .[1:][] as $a (.[0]; i($a)) ;

Examples:例子:

[ [1,2,4], [2,4,5], [4,5,6]] #=> [4]
[[]]                         #=> []
[]                           #=> null

Of course if x and y are already known to be sorted and/or unique, more efficient solutions are possible.当然,如果已知xy已排序和/或唯一,则可能有更有效的解决方案。 See in particular Finite Sets of JSON Entities特别参见JSON 实体的有限集

Simple Explanation简单说明

These complexity of all these answers obscured understanding the principle.所有这些答案的复杂性掩盖了对原理的理解。 That's unfortunate because the principle is simple:这很不幸,因为原理很简单:

  • array1 minus array2 returns: array1 减去 array2 返回:
  • everything that's left in array1 array1 中剩下的所有内容
  • after removing everything that is in array2删除 array2 中的所有内容后
  • (and discarding the rest of array2) (并丢弃 array2 的其余部分)

Simple Demo简单演示

# From array1, subtract array2, leaving the remainder
$ jq --null-input '[1,2,3,4] - [2,4,6,8]'
[
  1,
  3
]

# Subtract the remainder from the original
$ jq --null-input '[1,2,3,4] - [1,3]'
[
  2,
  4
]

# Put it all together
$ jq --null-input '[1,2,3,4] - ([1,2,3,4] - [2,4,6,8])'
[
  2,
  4
]

comm Demo comm演示

def comm:
  (.[0] - (.[0] - .[1])) as $d |
    [.[0]-$d, .[1]-$d, $d]
;

With that understanding, I was able to imitate the behavior of the *nix comm command有了这种理解,我就能够模仿*nix comm命令的行为

With no options, produce three-column output.在没有选项的情况下,生成三列输出。 Column one contains lines unique to FILE1, column two contains lines unique to FILE2, and column three contains lines common to both files.第一列包含 FILE1 独有的行,第二列包含 FILE2 独有的行,第三列包含两个文件共有的行。

$ echo 'def comm: (.[0]-(.[0]-.[1])) as $d | [.[0]-$d,.[1]-$d, $d];' > comm.jq
$ echo '{"a":101, "b":102, "c":103, "d":104}'                        > 1.json
$ echo '{         "b":202,          "d":204, "f":206, "h":208}'      > 2.json

$ jq --slurp '.' 1.json 2.json
[
  {
    "a": 101,
    "b": 102,
    "c": 103,
    "d": 104
  },
  {
    "b": 202,
    "d": 204,
    "f": 206,
    "h": 208
  }
]

$ jq --slurp '[.[] | keys | sort]' 1.json 2.json
[
  [
    "a",
    "b",
    "c",
    "d"
  ],
  [
    "b",
    "d",
    "f",
    "h"
  ]
]

$ jq --slurp 'include "comm"; [.[] | keys | sort] | comm' 1.json 2.json
[
  [
    "a",
    "c"
  ],
  [
    "f",
    "h"
  ],
  [
    "b",
    "d"
  ]
]

$ jq --slurp 'include "comm"; [.[] | keys | sort] | comm[2]' 1.json 2.json
[
  "b",
  "d"
]

Here is a solution which works by counting occurrences of elements in the arrays using foreach这是一个解决方案,它通过使用foreach计算数组中元素的出现次数来工作

[
  foreach ($X[], $Y[]) as $r (
    {}
  ; .[$r|tostring] += 1
  ; if .[$r|tostring] == 2 then $r else empty end
  )
]

If this filter is in filter.jq then如果此过滤器在filter.jq

jq -M -n -c --argjson X '[1,2,3,4]' --argjson Y '[2,4,6,8,10]' -f filter.jq

will produce会产生

[2,4]

It assumes there are no duplicates in the initial arrays.它假设初始数组中没有重复项。 If that's not the case then it is easy to compensate with unique .如果不是这种情况,那么很容易用unique进行补偿。 Eg例如

[
  foreach (($X|unique)[], ($Y|unique)[]) as $r (
    {}
  ; .[$r|tostring] += 1
  ; if .[$r|tostring] == 2 then $r else empty end
  )
]
$ echo '[1,2,3,4] [2,4,6,8,10]' | jq --slurp '[.[0][] as $x | .[1][] | select($x == .)]'
[
  2,
  4
]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM