简体   繁体   English

使用 jq 展平/规范化 json 对象数组

[英]Flatten / normalize json array of objects with jq

I have a large json array of objects.我有一个很大的 json 对象数组。 Each object contains a foreignKeyId , a url , (optionally) a urlMirror1 , and (optionally) a urlMirror2 .每个对象包含一个foreignKeyId 、一个url 、(可选)一个urlMirror1和(可选)一个urlMirror2

Here's a sample:这是一个示例:

[
  {
    "foreignKeyId": 1,
    "url": "https://1-url.com"
  },
  {
    "foreignKeyId": 2,
    "url": "https://2-url.com",
    "urlMirror1": "https://2-url-mirror-1.com",
  },
  {
    "foreignKeyId": 3,
    "url": "https://3-url.com",
    "urlMirror1": "https://3-url-mirror-1.com",
    "urlMirror2": "https://3-url-mirror-2.com"
  }
}

I want to normalize this json to something like below:我想将此 json 规范化为如下所示的内容:

[
  {
    "foreignKeyId": 1,
    "primariness": 1,
    "url": "https://1-url.com"
  },
  {
    "foreignKeyId": 2,
    "primariness": 1,
    "url": "https://2-url.com",
  },
  {
    "foreignKeyId": 2,
    "primariness": 2,
    "url": "https://2-url-mirror-1.com",
  },
  {
    "foreignKeyId": 3,
    "primariness": 1,
    "url": "https://3-url.com"
  },
  {
    "foreignKeyId": 3,
    "primariness": 2,
    "url": "https://3-url-mirror-1.com",
  },
  {
    "foreignKeyId": 3,
    "primariness": 3,
    "url": "https://3-url-mirror-2.com"
  }
}

Is there a way to do something like this using jq ?有没有办法使用jq做这样的事情? If not, any other suggestions to accomplish this quickly without writing too much custom code?如果没有,还有其他建议可以在不编写太多自定义代码的情况下快速完成此任务吗? This only needs to be run one time, so any kind of hacky one-off solution could work (bash script, etc.).这只需要运行一次,所以任何一种一次性的解决方案都可以工作(bash 脚本等)。

Thanks!谢谢!

Update: primariness should be derived from the key names ( url => 1 , urlMirror1 => 2 , urlMirror2 => 3 . Order of the keys inside any given object is insignificant. There is a fixed number of mirrors (eg, there is never a urlMirror3 ).更新: primariness应该从键名( url => 1 , urlMirror1 => 2 , urlMirror2 => 3 。任何给定对象内的键的顺序是无关紧要的。有固定数量的镜像(例如,从不一个urlMirror3 )。

Here is a simple script with hardcoded number of mirrors and primariness.这是一个简单的脚本,带有硬编码的镜像数量和主要性。 Hope it will do the trick.希望它会成功。

jq '
    map(
        { foreinKeyId } +
        (
            { primariness: 1, url },
            (.urlMirror1 // empty | { primariness: 2, url: . }),
            (.urlMirror2 // empty | { primariness: 3, url: . })
        )
    )
' input.json

Given that OP has limited the query from generic down to a more specific criteria, the answer provided by @luciole75w is the best (most probably), refer to that one.鉴于 OP 已将查询从通用限制为更具体的标准,@luciole75w 提供的答案是最好的(最有可能的),请参考该答案。

Now, for @oguzismail, this is a generic jtc approach (which will handle an arbitrary number of "urlMirror"s ) made of 3 JSON transformation steps ( updated solution ):现在,对于@oguzismail,这是由 3 个 JSON 转换步骤(更新的解决方案)组成的通用jtc方法(它将处理任意数量的"urlMirror"s ):

<file.json jtc -w'<foreignKeyId>l:<f>v[-1]<urlM>L:<u>v[^0]' \
               -i'{"url":{{u}},"foreignKeyId":{f}}' /\
               -w'[foreignKeyId]:<f>q:<p:0>v[^0][foreignKeyId]:<f>s:[-1]<p>I1' \
               -i'{"primeriness":{{p}}}' /\
               -pw'<urlM>L:' -tc
[
   { "foreignKeyId": 1, "primeriness": 1, "url": "https://1-url.com" },
   { "foreignKeyId": 2, "primeriness": 1, "url": "https://2-url.com" },
   { "foreignKeyId": 3, "primeriness": 1, "url": "https://3-url.com" },
   { "foreignKeyId": 2, "primeriness": 2, "url": "https://2-url-mirror-1.com" },
   { "foreignKeyId": 3, "primeriness": 2, "url": "https://3-url-mirror-1.com" },
   { "foreignKeyId": 3, "primeriness": 3, "url": "https://3-url-mirror-2.com" }
]
bash $ 

Explanation and visualization:解释和可视化:

- all the 3 steps can be observed in a "slow-mo": - 所有 3 个步骤都可以在“慢动作”中观察到:
1. for each found "foreignKeyId" and each "urlMirror" found within the same record extend (insert into) the array with {"url":... , "foreignKeyId": ...} : 1. 对于每个找到的"foreignKeyId"和在同一记录中找到的每个"urlMirror" ,使用{"url":... , "foreignKeyId": ...}扩展(插入)数组:

<file.json jtc -w'<foreignKeyId>l:<f>v[-1]<urlM>L:<u>v[^0]' \
               -i'{"url":{{u}},"foreignKeyId":{f}}' -tc
[
   { "foreignKeyId": 1, "url": "https://1-url.com" },
   { "foreignKeyId": 2, "url": "https://2-url.com", "urlMirror1": "https://2-url-mirror-1.com" },
   { "foreignKeyId": 3, "url": "https://3-url.com", "urlMirror1": "https://3-url-mirror-1.com", "urlMirror2": "https://3-url-mirror-2.com" },
   { "foreignKeyId": 2, "url": "https://2-url-mirror-1.com" },
   { "foreignKeyId": 3, "url": "https://3-url-mirror-1.com" },
   { "foreignKeyId": 3, "url": "https://3-url-mirror-2.com" }
]
bash $ 

2. now insert "primariness": N records based on the index of the occurrence of the foreignKeyId : 2.现在插入"primariness": N根据foreignKeyId出现的索引插入"primariness": N条记录:

<file.json jtc -w'<foreignKeyId>l:<f>v[-1]<urlM>L:<u>v[^0]' \
               -i'{"url":{{u}},"foreignKeyId":{f}}' /\
               -w'[foreignKeyId]:<f>q:<p:0>v[^0][foreignKeyId]:<f>s:[-1]<p>I1' \
               -i'{"primeriness":{{p}}}' -tc
[
   { "foreignKeyId": 1, "primeriness": 1, "url": "https://1-url.com" },
   { "foreignKeyId": 2, "primeriness": 1, "url": "https://2-url.com", "urlMirror1": "https://2-url-mirror-1.com" },
   { "foreignKeyId": 3, "primeriness": 1, "url": "https://3-url.com", "urlMirror1": "https://3-url-mirror-1.com", "urlMirror2": "https://3-url-mirror-2.com" },
   { "foreignKeyId": 2, "primeriness": 2, "url": "https://2-url-mirror-1.com" },
   { "foreignKeyId": 3, "primeriness": 2, "url": "https://3-url-mirror-1.com" },
   { "foreignKeyId": 3, "primeriness": 3, "url": "https://3-url-mirror-2.com" }
]
bash $ 

3. and final step ( -pw'<urlM>L:' ) - rid of all redundant "urlMirror" s records. 3. 最后一步 ( -pw'<urlM>L:' ) - 删除所有多余的"urlMirror"记录。

Optionally : if there's a requirement to sort all the records within the top array as per the OP's example, then this additional step will do: -jw'[foreignKeyId]:<>g:[-1]'可选:如果需要按照 OP 的示例对顶部数组中的所有记录进行排序,则此附加步骤将执行: -jw'[foreignKeyId]:<>g:[-1]'

PS.附注。 it so happens that I'm also a developer of the jtc unix tool碰巧我也是jtc unix 工具的开发者

Here's a general solution, that is, it will handle arbitrarily many urlMirrors.这里有一个通用的解决方案,即它会处理任意多个urlMirror。

For the sake of clarity, let's begin by defining a helper function that emits a stream of {foreignKeyId, primariness, url} objects for a single input object:为了清楚起见,让我们首先定义一个辅助函数,该函数为单个输入对象发出 {foreignKeyId, primariness, url} 对象流:

def primarinesses:
  {foreinKeyId} +
    ({primariness:1, url},
     (to_entries[]
      | (.key | capture( "^urlMirror(?<n>[0-9]+)")) as $n
      | {primariness: ($n.n | tonumber + 1), url : .value } )) ;

The solution is then simply:解决方案很简单:

[.[] | primarinesses]

which can also be written with less punctuation as:也可以用更少的标点符号写成:

map(primarinesses)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM