简体   繁体   English

使用jq删除嵌套数组的匹配/非匹配元素

[英]Remove matching/non-matching elements of a nested array using jq

I need to split the results of a sonarqube analysis history into individual files. 我需要将sonarqube分析历史记录的结果拆分为单个文件。 Assuming a starting input below, 假设下面有一个开始输入,

    {
  "paging": {
    "pageIndex": 1,
    "pageSize": 100,
    "total": 3
  },
  "measures": [
    {
      "metric": "coverage",
      "history": [
        {
          "date": "2018-11-18T12:37:08+0000",
          "value": "100.0"
        },
        {
          "date": "2018-11-21T12:22:39+0000",
          "value": "100.0"
        },
        {
          "date": "2018-11-21T13:09:02+0000",
          "value": "100.0"
        }
      ]
    },
    {
      "metric": "bugs",
      "history": [
        {
          "date": "2018-11-18T12:37:08+0000",
          "value": "0"
        },
        {
          "date": "2018-11-21T12:22:39+0000",
          "value": "0"
        },
        {
          "date": "2018-11-21T13:09:02+0000",
          "value": "0"
        }
      ]
    },
    {
      "metric": "vulnerabilities",
      "history": [
        {
          "date": "2018-11-18T12:37:08+0000",
          "value": "0"
        },
        {
          "date": "2018-11-21T12:22:39+0000",
          "value": "0"
        },
        {
          "date": "2018-11-21T13:09:02+0000",
          "value": "0"
        }
      ]
    }
  ]
}

How do I use jq to clean the results so it only retains the history array entries for each element? 如何使用jq清除结果,以便它只保留每个元素的历史数组条目? The desired output is something like this (output-20181118123808.json for analysis done on "2018-11-18T12:37:08+0000"): 所需的输出是这样的(输出20181118123808.json用于“2018-11-18T12:37:08 + 0000”的分析):

{
  "paging": {
    "pageIndex": 1,
    "pageSize": 100,
    "total": 3
  },
  "measures": [
    {
      "metric": "coverage",
      "history": [
        {
          "date": "2018-11-18T12:37:08+0000",
          "value": "100.0"
        }
      ]
    },
    {
      "metric": "bugs",
      "history": [
        {
          "date": "2018-11-18T12:37:08+0000",
          "value": "0"
        }
      ]
    },
    {
      "metric": "vulnerabilities",
      "history": [
        {
          "date": "2018-11-18T12:37:08+0000",
          "value": "0"
        }
      ]
    }
  ]
}

I am lost on how to operate only on the sub-elements while leaving the parent structure intact. 我迷失了如何仅在子元素上操作同时保持父结构完整。 The naming of the JSON file is going to be handled externally from the jq utility. JSON文件的命名将从jq实用程序外部处理。 The sample data provided will be split into 3 files. 提供的样本数据将分为3个文件。 Some other input can have a variable number of entries, some may be up to 10000. Thanks. 其他一些输入可以有可变数量的条目,有些可能高达10000.谢谢。

Here is a solution which uses awk to write the distinct files. 这是一个使用awk编写不同文件的解决方案。 The solution assumes that the dates for each measure are the same and in the same order, but imposes no limit on the number of distinct dates, or the number of distinct measures. 该解决方案假定每个度量的日期相同且顺序相同,但对不同日期的数量或不同度量的数量没有限制。

jq -c 'range(0; .measures[0].history|length) as $i
  | (.measures[0].history[$i].date|gsub("[^0-9]";"")),  # basis of filename
    reduce range(0; .measures|length) as $j (.;
      .measures[$j].history |= [.[$i]])' input.json |
awk -F\\t 'fn {print >> fn; fn="";next}{fn="output-" $1 ".json"}'

Comments 评论

The choice of awk here is just for convenience. awk的选择只是为了方便。

The disadvantage of this approach is that if each file is to be neatly formatted, an additional run of a pretty-printer (such as jq) would be required for each file. 这种方法的缺点是,如果要对每个文件进行整齐格式化,则每个文件都需要额外运行漂亮的打印机(例如jq)。 Thus, if the output in each file is required to be neat, a case could be made for running jq once for each date, thus obviating the need for the post-processing ( awk ) step. 因此,如果要求每个文件中的输出都是整洁的,则可以为每个日期运行jq一次,从而避免了对后处理( awk )步骤的需要。

If the dates of the measures are not in lock-step, then the same approach as above could still be used, but of course the gathering of the dates and the corresponding measures would have to be done differently. 如果措施的日期不是锁定步骤,​​那么仍然可以使用与上述相同的方法,但当然,收集日期和相应的措施必须以不同方式进行。

Output 产量

The first two lines produced by the invocation of jq above are as follows: 上面调用jq产生的前两行如下:

"201811181237080000"
{"paging":{"pageIndex":1,"pageSize":100,"total":3},"measures":[{"metric":"coverage","history":[{"date":"2018-11-18T12:37:08+0000","value":"100.0"}]},{"metric":"bugs","history":[{"date":"2018-11-18T12:37:08+0000","value":"0"}]},{"metric":"vulnerabilities","history":[{"date":"2018-11-18T12:37:08+0000","value":"0"}]}]}

In the comments, the following addendum to the original question appeared: 在评论中,出现了原始问题的以下附录:

is there a variation wherein the filtering is based on the date value and not the position? 是否存在一种变化,其中过滤是基于日期值而不是位置? It is not guaranteed that the order will be the same or the number of elements in each metric is going to be the same (ie some dates may be missing "bugs", some might have additional metric such as "complexity"). 不能保证订单是相同的,或者每个度量中的元素数量将是相同的(即一些日期可能缺少“错误”,一些可能具有额外的度量,例如“复杂性”)。

The following will produce a stream of JSON objects, one per date. 以下将生成一个JSON对象流,每个日期一个。 This stream can be annotated with the date as per my previous answer, which shows how to use these annotations to create the various files. 此流可以使用我之前的答案中的日期进行注释,该日期显示如何使用这些注释来创建各种文件。 For ease of understanding, we use two helper functions: 为了便于理解,我们使用了两个辅助函数:

def dates:
  INDEX(.measures[].history[].date; .)
  | keys;

def gather($date): map(select(.date==$date));

dates[] as $date
| .measures |= map( .history |= gather($date) )

INDEX/2 INDEX / 2

If your jq does not have INDEX/2 , now would be an excellent time to upgrade, but in case that's not feasible, here is its def: 如果你的jq没有INDEX/2 ,那么现在是升级的绝佳时机,但是如果不可行的话,这就是它的def:

def INDEX(stream; idx_expr):
  reduce stream as $row ({};
    .[$row|idx_expr|
      if type != "string" then tojson
      else .
      end] |= $row);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM