繁体   English   中英

如何使用jq获取json流的嵌套键

[英]How to get nested keys of a json stream using jq

我试图设计一些关系表来保存各种json流的解析输出。 数据流具有相当复杂的结构并且为了便于表设计,我需要知道每个流的每个级别的嵌套键。 我迷失了如何使用jq从流中获取每个嵌套密钥。 以下是简化的代表性json流。

{
  "startAt": 0,
  "total": 5315,
  "issues": [
    {
      "id": "44269",
      "name": "someName",
      "fields": {
        "fixVersions": [
          {
            "id": "11401",
            "releaseDate": "2016-09-30"
          }
        ],
        "status": {
          "id": "10110",
          "statusCategory": {
            "id": 3,
            "name": "Done"
          }
        }
      }
    },
    {
      "id": "44270",
      "key": "LEAD-XXXX",
      "fields": {
        "assignee": {
          "id": "10111",
          "name": "Don"
        },
        "status": {
          "id": "10110",
          "statusCategory": {
            "id": 2,
            "name": "inProgress"
          }
        }
      }
    }
  ]
}

我期待以下输出。 我很乐意有更好的方法来帮助我设计桌子。

startAt
total
issues: []
issues:id
issues:name
issues:key
issues:fields
issues:fields:fixVersions: []
issues:fields:fixVersions:id
issues:fields:fixVersions:releaseDate
issues:fields:status
issues:fields:status:id
issues:fields:status:statusCategory
issues:fields:status:statusCategory:id
issues:fields:status:statusCategory:name
issues:fields:assignee
issues:fields:assignee:id
issues:fields:assignee:name

如何使用jq获取上述流的嵌套键。 非常感谢你的帮助。

我会非常乐意有一个更好的方法......

如果我是你,我会从以下开始(也许可以结束):

[paths(scalars) | map(if type == "number" then 0 else . end)]
| unique
| .[]

在您的示例中,使用-cr命令行选项,这会产生:

["issues",0,"fields","assignee","id"]
["issues",0,"fields","assignee","name"]
["issues",0,"fields","fixVersions",0,"id"]
["issues",0,"fields","fixVersions",0,"releaseDate"]
["issues",0,"fields","status","id"]
["issues",0,"fields","status","statusCategory","id"]
["issues",0,"fields","status","statusCategory","name"]
["issues",0,"id"]
["issues",0,"key"]
["issues",0,"name"]
["startAt"]
["total"]

你可以更接近你已经表明你希望我将数字0映射到一个字符串,但是你必须要小心该字符串和键名之间的潜在冲突。 为了显示:

[paths(scalars) | map(if type == "number" then "[]" else . end)]
| unique
| .[]
| join(":")

生产:

issues:[]:fields:assignee:id
issues:[]:fields:assignee:name
issues:[]:fields:fixVersions:[]:id
issues:[]:fields:fixVersions:[]:releaseDate
issues:[]:fields:status:id
issues:[]:fields:status:statusCategory:id
issues:[]:fields:status:statusCategory:name
issues:[]:id
issues:[]:key
issues:[]:name
startAt
total

请注意,此方法产生与基于模式推理的方法基本相同的结果。 这是一件好事。

使用INDEX / 2

使用如上所述的unique/0有两个潜在的缺点:(1)输出的排序不反映数据中的排序; (2)效率(虽然在实践中不太可能是一个真正的问题,除了可能有大量叶子路径的JSON文本)。

无论如何,可以使用INDEX/2代替unique 如果您的jq没有INDEX/2 ,则在此处给出其def。

简而言之:

def INDEX(stream; idx_expr):
  reduce stream as $row ({};
    .[$row|idx_expr|
      if type != "string" then tojson
      else .
      end] |= $row);

INDEX(paths(scalars)
      | map(if type == "number" then "[]" else . end); .)
| .[]
| join(":")

收益率:

startAt
total
issues:[]:id
issues:[]:name
issues:[]:fields:fixVersions:[]:id
issues:[]:fields:fixVersions:[]:releaseDate
issues:[]:fields:status:id
issues:[]:fields:status:statusCategory:id
issues:[]:fields:status:statusCategory:name
issues:[]:key
issues:[]:fields:assignee:id
issues:[]:fields:assignee:name

空数组的路径

如果你想要报告空数组的路径,你可以(例如)简单地将“paths(scalars)”更改为“(paths(scalars),paths(arrays))”。

如果您想要数据的原理图表示,您可能希望考虑基于模式推断的方法。

例如,使用https://gist.github.com/pkoppstein/a5abb4ebef3b0f72a6ed中定义的schema函数,您的输入将产生以下推断模式:

{
  "startAt": "number",
  "total": "number",
  "issues": [
    {
      "fields": {
        "assignee": {
          "id": "string",
          "name": "string"
        },
        "fixVersions": [
          {
            "id": "string",
            "releaseDate": "string"
          }
        ],
        "status": {
          "id": "string",
          "statusCategory": {
            "id": "number",
            "name": "string"
          }
        }
      },
      "id": "string",
      "key": "string",
      "name": "string"
    }
  ]
}

如果您通过paths(scalars)过滤,则会得到:

["startAt"]
["total"]
["issues",0,"fields","assignee","id"]
["issues",0,"fields","assignee","name"]
["issues",0,"fields","fixVersions",0,"id"]
["issues",0,"fields","fixVersions",0,"releaseDate"]
["issues",0,"fields","status","id"]
["issues",0,"fields","status","statusCategory","id"]
["issues",0,"fields","status","statusCategory","name"]
["issues",0,"id"]
["issues",0,"key"]
["issues",0,"name"]

除了排序外,这些结果与使用更直接的方法获得的结果相同; 我建议验证这两种方法。

paths绝对是正确的方法,但获得所需的确切输出有点麻烦。 除了精确的排序之外,这是一个过滤器:

def normalize:    # convert paths to requested structure
    if .[-1]|type=="number" then .[-1]="[]" else . end
  | map(select(type!="number"));

def collect:      # collect unique normalized paths into an object
  reduce (paths|normalize) as $p (
     {}
   ; if getpath($p)==null then setpath($p;null) else . end
  );

def colonize($p): # convert object back into : separated paths
    keys_unsorted[] as $k
  | (if $p=="" then $k else "\($p):\($k)" end) as $n
  | $n, (.[$k] | if type=="object" then colonize($n) else empty end);

def summary:      # final output without redundant foo: if foo:[] is present 
    [ collect | colonize("") ]
  | map(select(endswith(":[]"))|.[:-3]) as $remove
  | map(select($remove[[.]]==[]));

summary[]

示例运行(假设filter.jq过滤器和data.json数据)

$ jq -Mcr -f filter.jq data.json
startAt
total
issues:[]
issues:id
issues:name
issues:fields
issues:fields:fixVersions:[]
issues:fields:fixVersions:id
issues:fields:fixVersions:releaseDate
issues:fields:status
issues:fields:status:id
issues:fields:status:statusCategory
issues:fields:status:statusCategory:id
issues:fields:status:statusCategory:name
issues:fields:assignee
issues:fields:assignee:id
issues:fields:assignee:name
issues:key

在线尝试!

请注意,此处存在空数组的问题。 如果你的数据有空洞的阵列,因为返回的相应的路径此过滤器将报告他们作为普通领域paths不会在数字结束。 补偿这一点的最简单方法是首先将空数组映射到非空状态,如[{}] 例如

def walk(f):  # defined here in case your jq doesn't have it
    . as $in
  | if type == "object" then reduce keys_unsorted[] as $key (
        {}; . + { ($key):  ($in[$key] | walk(f)) } ) | f
    elif type == "array" then map( walk(f) ) | f
    else f
    end;

  walk(if .==[] then [{}] else . end)
| summary[]

为了清楚起见 - 编写一个以最初设想的格式生成输出的jq过滤器非常容易,尽管这种格式不太常用。

以下方法不需要使用walk/1来处理空数组的特殊情况。 它仅使用unique因为INDEX/2不包含在jq版本1.5(*)中。

使用示例输入和-r命令行选项,以下内容:

 [paths as $p
  | if (getpath($p)|type) == "array" then $p + [" []"]
    elif ($p[-1]|type) == "number" then empty
    else $p
    end
    | map(select(type != "number"))]
 | unique
 | .[]
 | join(":")

生产:

issues: []
issues:fields
issues:fields:assignee
issues:fields:assignee:id
issues:fields:assignee:name
issues:fields:fixVersions: []
issues:fields:fixVersions:id
issues:fields:fixVersions:releaseDate
issues:fields:status
issues:fields:status:id
issues:fields:status:statusCategory
issues:fields:status:statusCategory:id
issues:fields:status:statusCategory:name
issues:id
issues:key
issues:name
startAt
total

通过使用INDEX/2可以很容易地避免使用(*) unique ,如本页其他地方所述。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM