简体   繁体   English

jq:使用不必要的嵌套级别展平对象

[英]jq: Flatten objects with unnecessary nested levels

I'm facing the problem of having a json file where the same key sometimes has a flat value, while others it has an additional nested (and for my purposes unnecessary) level which then includes the related value. 我面临着一个json文件的问题,其中相同的键有时具有平坦的值,而其他的它有一个额外的嵌套(并且为了我的目的不必要)级别,然后包括相关值。

The file is newline delimited and I am trying to get rid of any additional levels. 该文件是换行符分隔,我试图摆脱任何额外的级别。 So far I've managed to do that only if the nested level appears in the first branch of the tree, using 到目前为止,只有当嵌套级别出现在树的第一个分支中时,我才能使用

jq -c '[.] | map(.[] |= if type == "object" and (.number | length) > 0 then .numberLong else . end) | .[]' mongoDB.json

The example below illustrates that further. 以下示例进一步说明了这一点。 What I have initially: 我最初的东西:

  {
    "name": "John",
    "age": {
        "numberLong": 22
      }
  }
  {
    "name": "Jane",
    "age": 24
  }
  {
    "name": "Dennis",
    "age": 34,
    "details": [
      {
        "telephone_number": 555124124
      }
    ]
  }
  {
    "name": "Frances",
    "details": [
      {
        "telephone_number": {
            "numberLong": 444245523
          }
      }
    ]
  }

What my script does (the second numberLong is ignored): 我的脚本做了什么(第二个numberLong被忽略):

  {
    "name": "John",
    "age": 22
  },
  {
    "name": "Jane",
    "age": 24
  }
  {
    "name": "Dennis",
    "age": 34,
    "details": [
      {
        "telephone_number": 555124124
      }
    ]
  }
  {
    "name": "Frances",
    "details": [
      {
        "telephone_number":  {
            "numberLong": 444245523
          }
      }
    ]
  }

What I am actually hoping to achieve (recursively copy the values of all numberLong keys one level up, regardless of where they belong in the file) : 我实际上希望实现的目标(递归地将所有numberLong键的值复制一级,无论它们在文件中的位置如何):

[
  {
    "name": "John",
    "age": 22
  },
  {
    "name": "Jane",
    "age": 24
  },
  {
    "name": "Dennis",
    "age": 34,
    "details": [
      {
        "telephone_number": 555124124
      }
    ]
  },
  {
    "name": "Frances",
    "details": [
      {
        "telephone_number": 444245523
      }
    ]
  }
]

This transformation is part of a daily pipeline and is applied to several files with sizes up to 70GB, so speed while traversing the files could potentially be an issue. 此转换是日常管道的一部分,并应用于大小高达70GB的多个文件,因此在遍历文件时速度可能是个问题。 The problem stems from MongoDB's different types: MongoDB differences between NumberLong and simple Integer? 问题源于MongoDB的不同类型: NumberLong和简单Integer之间的MongoDB差异?

Thanks! 谢谢!

If your jq has 'walk/1' then the simplest completely generic solution would be along these lines: 如果你的jq有'walk / 1'那么最简单的完全通用的解决方案就是这样:

walk( if type=="object"
      then with_entries( if .value | (type == "object" and has("numberLong"))
                         then .value |= .numberLong
                         else . end)
      else . end )

If your jq does not have 'walk', then it would be best to upgrade, as that will also improve speed; 如果你的jq没有'walk',那么最好升级,因为这也会提高速度; otherwise you can google for its def in jq. 否则你可以google在jq中的def。

If this is too slow for your very large files, you may have to track down the precise locations where the transformation is needed to avoid the overhead of a completely generic approach. 如果这对于非常大的文件来说太慢,则可能需要跟踪需要转换的精确位置,以避免完全通用方法的开销。

Notes on handling very large files 处理非常大的文件的注意事项

Your example ("What I have initially") gives a stream of objects, so it might be worth pointing out that since jq is stream-oriented, it has no problem handling very large files consisting of streams of JSON entities (aka "documents") that are not so large individually. 你的例子(“我最初的东西”)提供了一个对象流,因此值得指出的是,由于jq是面向流的,因此处理由JSON实体流(又称“文档”)组成的非常大的文件时没有问题。 )不是单独的那么大。

(An approximate rule of thumb is that if the largest JSON entity in the input has size N units, and if the largest JSON entity created by jq has size M units, then jq might need access to about M + N + max(M,N) units of memory.) (一个近似的经验法则是,如果输入中最大的JSON实体具有N个单位的大小,并且如果jq创建的最大JSON实体具有大小为M个单位,那么jq可能需要访问大约M + N + max(M, N)记忆单位。)

To handle a very large file containing a single JSON array, it might be advisable to begin by producing a stream of the top-level elements for subsequent processing. 要处理包含单个JSON数组的非常大的文件,建议首先生成顶级元素流以供后续处理。

In the worst-possible case (a very large file with one very large, complex JSON document) you might have to use a streaming parser such as the one that jq has. 在最糟糕的情况下(一个包含一个非常大的复杂JSON文档的非常大的文件),您可能必须使用流式解析器,例如jq具有的流式解析器。

For illustrations of various techniques for handling very large files, see Process huge GEOJson file with jq 有关处理非常大的文件的各种技术的说明,请参阅使用jq处理巨大的GEOJson文件

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM