简体   繁体   中英

Parsing multiple key/values in json tree with jq

Using jq, I'd like to cherry-pick key/value pairs from the following json:

{
  "project": "Project X",
  "description": "This is a description of Project X",
  "nodes": [
    {
      "name": "server001",
      "detail001": "foo",
      "detail002": "bar",
      "networks": [
        {
          "net_tier": "network_tier_001",
          "ip_address": "10.1.1.10",
          "gateway": "10.1.1.1",
          "subnet_mask": "255.255.255.0",
          "mac_address": "00:11:22:aa:bb:cc"
        }
      ],
      "hardware": {
        "vcpu": 1,
        "mem": 1024,
        "disks": [
          {
            "disk001": 40,
            "detail001": "foo"
          },
          {
            "disk002": 20,
            "detail001": "bar"
          }
        ]
      },
      "os": "debian8",
      "geo": {
        "region": "001",
        "country": "Sweden",
        "datacentre": "Malmo"
      },
      "detail003": "baz"
    }
  ],
  "detail001": "foo"
}

For the sake of an example, I'd like to parse the following keys and their values: "Project", "name", "net_tier", "vcpu", "mem", "disk001", "disk002".

I'm able to parse individual elements without much issue, but due to the hierarchical nature of the full parse, I've not had much luck parsing down different branches (ie both networks and hardware > disks).

Any help appreciated.

Edit:

For clarity, the output I'm going for is a comma-separated CSV. In terms of parsing all combinations, covering the sample data in the example will do for now. I will hopefully be able to expand on any suggestions.

Here's one way you could achieve the desired output.

program.jq:

["project","name","net_tier","vcpu","mem","disk001","disk002"],
  [.project]
+ (.nodes[] | .networks[] as $n |
    [
      .name,
      $n.net_tier,
      (.hardware |
        .vcpu,
        .mem,
        (.disks | add["disk001","disk002"])
      )
    ]
  )
| @csv
$ jq -r -f program.jq input.json
"project","name","net_tier","vcpu","mem","disk001","disk002"
"Project X","server001","network_tier_001",1,1024,40,20

Basically, you'll want to project the fields that you want into arrays so you may convert those arrays to csv rows. Your input makes it seem like there could potentially be multiple networks for a given node. So if you wanted to output all combinations, that would have to be flattened out.

Here's another approach, that is short enough to speak for itself:

def s(f): first(.. | f? // empty) // null;

[s(.project), s(.name), s(.net_tier), s(.vcpu), s(.mem), s(.disk001), s(.disk002)]
| @csv

Invocation:

$ jq -r -f value-pairs.jq input.json

Result:

"Project X","server001","network_tier_001",1,1024,40,20

With headers

Using the same s/1 as above:

. as $d
| ["project", "name", "net_tier", "vcpu", "mem", "disk001","disk002"]
| (., map( . as $v | $d | s(.[$v])))
| @csv

With multiple nodes

Again with s/1 as above:

.project as $p
| ["project", "name", "net_tier", "vcpu", "mem", "disk001","disk002"] as $h
| ($h,
   (.nodes[] as $d
   | $h
   | map( . as $v | $d | s(.[$v]) )
   | .[0] = $p)
   ) | @csv

Output with the illustrative multi-node data:

"project","name","net_tier","vcpu","mem","disk001","disk002"
"Project X","server001","network_tier_001",1,1024,40,20
"Project X","server002","network_tier_002",1,1024,,40

Here is a different filter which computes the unique set of network tier and disk names and then generates a result with columns appropriate to the data.

  {
    tiers: [ .nodes[].networks[].net_tier ] | unique
  , disks: [ .nodes[].hardware.disks[] | keys[] | select(startswith("disk")) ] | unique
  } as $n

| def column_names($n): [ "project", "name" ] + $n.tiers + ["vcpu", "mem"] + $n.disks ;
  def tiers($n):        [ $n.tiers[] as $t | .networks[] | if .net_tier==$t then $t else null end ] ;
  def disks($n):        [ $n.disks[] as $d | map(select(.[$d]!=null)|.[$d])[0] ] ;
  def rows($n):
      .project as $project
    | .nodes[]
    | .name as $name
    | tiers($n) as $tier_values
    | .hardware
    | .vcpu as $vcpu
    | .mem as $mem
    | .disks
    | disks($n) as $disk_values
    | [$project, $name] + $tier_values + [$vcpu, $mem] + $disk_values
  ;
  column_names($n), rows($n)

| @csv

The benfit of this approach becomes apparent if we add another node to the sample data:

{
  "name": "server002",
  "networks": [
    {
      "net_tier": "network_tier_002"
    }
  ],
  "hardware": {
    "vcpu": 1,
    "mem": 1024,
    "disks": [
      {
        "disk002": 40,
        "detail001": "foo"
      }
    ]
  }
}

Sample Run (assuming filter in filter.jq and amended data in data.json )

$ jq -Mr -f filter.jq data.json
"project","name","network_tier_001","network_tier_002","vcpu","mem","disk001","disk002"
"Project X","server001","network_tier_001","",1,1024,40,20
"Project X","server002",,"network_tier_002",1,1024,,40

Try it online!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM