简体   繁体   中英

Use JQ to parse JSON nested objects, using select to match key-value in nested object while showing existing structure

Use JQ to parse JSON nested objects, using select to match key-value in nested object while showing existing structure

I am trying to take a complex JSON file of 20,000+ lines and extract a specific key while retaining the surrounding metadata which adds necessary human-understandable context.


Data source (complex structure):

{
  "Marketplace": [
    {
      "Level1Name": "Company A Products",
      "Level1Array": [
        {
          "Level2Name": "USA Products List",
          "Level2Contents": [
            {
              "Level3Name": "ALL",
              "Level3URL": "https://a.com/products"
            },
            {
              "Level3Name": "Subset1001",
              "Level3URL": "https://a.com/products/subset1001"
            }
          ]
        }
      ]
    },
    {
      "Level1Name": "Company B Products",
      "Level1Array": [
        {
          "Level2Name": "USA Products List",
          "Level2Contents": [
            {
              "Level3Name": "ALL",
              "Level3URL": "https://b.com/products"
            },
            {
              "Level3Name": "Subset500",
              "Level3URL": "https://b.com/products/subset500"
            }
          ]
        },
        {
          "Level2Name": "EU Products List",
          "Level2Contents": [
            {
              "Level3Name": "ALL",
              "Level3URL": "https://b.eu/products"
            },
            {
              "Level3Name": "Subset200",
              "Level3URL": "https://b.eu/products/subset200"
            }
          ]
        }
      ]
    },
    {
      "Level1Name": "Company X Products",
      "Level1Array": [
        {
          "Level2Name": "Deleted Products",
          "Level2URL": "https://internal.x.com/products"
        }
      ]
    }
  ]
}

JQ command currently used to extract removes all other contextual metadata...

jq -r '(
         .Marketplace[].Level1Array[].Level2Contents[]
         | select (.Level3Name | index("ALL"))
         | [.]
         )'

Output given...

[
  {
    "Level3Name": "ALL",
    "Level3URL": "https://a.com/products"
  }
]
[
  {
    "Level3Name": "ALL",
    "Level3URL": "https://b.com/products"
  }
]
[
  {
    "Level3Name": "ALL",
    "Level3URL": "https://b.eu/products"
  }
]

Option 1 output desired, same JSON structure with removing all other objects which does not match select filter "ALL" string criteria

{
    "Marketplace":
  [
        {
            "Level1Name": "Company A Products",
            "Level1Array": [
                {
                    "Level2Name": "USA Products List",
                    "Level2Contents": [
                        {
                            "Level3Name": "ALL",
                            "Level3URL": "https://a.com/products"
                        }
                    ]
                }
            ]
        },
        {
            "Level1Name": "Company B Products",
            "Level1Array": [
                {
                    "Level2Name": "USA Products List",
                    "Level2Contents": [
                        {
                            "Level3Name": "ALL",
                            "Level3URL": "https://b.com/products"
                        }
                    ]
                },
                {
                    "Level2Name": "EU Products List",
                    "Level2Contents": [
                        {
                            "Level3Name": "ALL",
                            "Level3URL": "https://b.eu/products"
                        }
                    ]
                }
            ]
        }
    ]
}

Option 2 output desired, any similar format which can be iterated with loop such as:

{
  "Marketplace":
  [
    {
      "Level1Name": "Company A Products",
      "Level2Name": "USA Products List",
      "Level3Name": "ALL",
      "Level3URL": "https://a.com/products"
    },
    {
      "Level1Name": "Company B Products",
      "Level2Name": "USA Products List",
      "Level3Name": "ALL",
      "Level3URL": "https://b.com/products"
    },
    {
      "Level1Name": "Company B Products",
      "Level2Name": "EU Products List",
      "Level3Name": "ALL",
      "Level3URL": "https://b.eu/products"
    }
  ]
}

The following filter produces the "Option 2" output:

.Marketplace |= map(
  {Level1Name} as $Level1Name
  | .Level1Array[]
  | {Level2Name} as $Level2Name
  | .Level2Contents[]?
  | select(.Level3Name == "ALL")
  | $Level1Name + $Level2Name + . )

Breaking it down ...

One way to understand this is to consider:

.Marketplace[]
| {Level1Name} as $Level1Name
| .Level1Array[]
| {Level2Name} as $Level2Name
| .Level2Contents[]?             # in case .Level2Contents is missing
| if (.Level3Name == "ALL")
  then $Level1Name + $Level2Name + .
  else empty
  end

Addendum: "Name"

The OP subsequently asked what can be done if the three levels of "Name" keys are all named "Name". An answer can easily be obtained by tweaking the above, to yield:

.Marketplace |= map(
  {Level1Name: .Name} as $Level1Name
  | .Level1Array[]
  | {Level2Name: .Name} as $Level2Name
  | .Level2Contents[]?
  | select(.Name == "ALL")
  | $Level1Name + $Level2Name + . )

Output

In this case, the output would be as follows:

{
  "Marketplace": [
    {
      "Level1Name": "Company A Products",
      "Level2Name": "USA Products List",
      "Name": "ALL",
      "Level3URL": "https://a.com/products"
    },
    {
      "Level1Name": "Company B Products",
      "Level2Name": "USA Products List",
      "Name": "ALL",
      "Level3URL": "https://b.com/products"
    },
    {
      "Level1Name": "Company B Products",
      "Level2Name": "EU Products List",
      "Name": "ALL",
      "Level3URL": "https://b.eu/products"
    }
  ]
}

Here's a different approach you could take for this problem. As I understand it, you want a way to search your recursive tree of objects for some value, and remove any objects that do not have a property with that value.

What you could do is search for the paths of all values you wish to keep (has the value you're searching for), then remove all other objects that are not on the paths of any of the paths to keep.

def is_subpath($paths): [., length] as [$path, $length] |
    any($paths[]; $length <= length and $path == .[:$length]);
[paths(strings == "ALL")[:-1]] as $keepers
| delpaths([paths(objects) | select(is_subpath($keepers) | not)])
{
  "Marketplace": [
    {
      "Level1Name": "Company A Products",
      "Level1Array": [
        {
          "Level2Name": "USA Products List",
          "Level2Contents": [
            {
              "Level3Name": "ALL",
              "Level3URL": "https://a.com/products"
            }
          ]
        }
      ]
    },
    {
      "Level1Name": "Company B Products",
      "Level1Array": [
        {
          "Level2Name": "USA Products List",
          "Level2Contents": [
            {
              "Level3Name": "ALL",
              "Level3URL": "https://b.com/products"
            }
          ]
        },
        {
          "Level2Name": "EU Products List",
          "Level2Contents": [
            {
              "Level3Name": "ALL",
              "Level3URL": "https://b.eu/products"
            }
          ]
        }
      ]
    }
  ]
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM