简体   繁体   English

jq:按嵌套结构分组并展平JSON

[英]jq: Group by nested structures and flatten JSON

I'm new to jq and command line tools in general, but I need to group by nested structures in a JSON file and flatten the nested structures, and I've not been able to find a workable solution for a few days, here's a sample of my JSON. 一般而言,我是jq和命令行工具的新手,但是我需要对JSON文件中的嵌套结构进行分组并展平嵌套的结构,并且几天来一直找不到可行的解决方案,这是我的JSON样本。

[
  {
    "Value1": "0",
    "Conversions": "0",
    "Revenue": "0.00",
    "serverTimestamp": 84615198,
    "pluginsIcons": [
      {
        "pluginName": "pdf",
        "pluginIcon": "pdf1"
      },
      {
        "pluginName": "java",
        "pluginIcon": "java1"
      }
    ],
    "plugins": "pdf, java",
    "customVariables": {
      "3": {
        "customVariableValue3": "F",
        "customVariableName3": "Gender"
      },
      "2": {
        "customVariableValue2": "Person",
        "customVariableName2": "Role"
      },
      "1": {
        "customVariableValue1": "Partner1",
        "customVariableName1": "Partner"
      }
    },
    "interactions": "7",
    "actions": "3",
    "actionDetails": [
      {
        "timestamp": 84615195,
        "interactionPosition": "1",
        "type": "action"
      },
      {
        "timestamp": 84615145,
        "interactionPosition": "2",
        "type": "action"
      },
      {
        "timestamp": 84615693,
        "interactionPosition": "3",
        "type": "action",
        "customVariables": {
          "2": {
            "customVariablePageValue2": "value2",
            "customVariablePageName2": "name2"
          },
          "1": {
            "customVariablePageValue1": "value1",
            "customVariablePageName1": "name1"
          }
        }
      }
    ],
    "operatingSystem": "Windows 10"
  },
  {
    "Value1": "18",
    "Conversions": "1",
    "Revenue": "0.00",
    "serverTimestamp": 84615189,
    "pluginsIcons": [
      {
        "pluginName": "pdf",
        "pluginIcon": "pdf1"
      }
    ],
    "plugins": "pdf",
    "customVariables": {
      "3": {
        "customVariableValue3": "M",
        "customVariableName3": "Gender"
      },
      "2": {
        "customVariableValue2": "Admin",
        "customVariableName2": "Role"
     },
      "1": {
        "customVariableValue1": "Place",
        "customVariableName1": "Subdomain"
      }
    },
    "interactions": "6",
    "actions": "3",
    "actionDetails": [
      {
        "timestamp": 84635189,
        "timeSpent": "11",
        "interactionPosition": "1",
        "type": "action"
      },
      {
        "timestamp": 846351834,
        "timeSpent": "11",
        "interactionPosition": "2",
        "type": "search"
      },
      {
        "timestamp": 846351832,
        "timeSpent": "1",
        "interactionPosition": "3",
        "type": "action",
        "customVariables": {
          "2": {
            "customVariablePageValue2": "value2",
            "customVariablePageName2": "name2"
          },
          "1": {
            "customVariablePageValue3": "value3",
            "customVariablePageName3": "name3"
          }
        },
        "generationTime": "890"
      }
    ],
    "operatingSystem": "Windows 10"
  }
]

The way it should look at the end result would be with one flattened entry for each "action" in the nested arrays under "actionDetails" 它应查看最终结果的方式是在“ actionDetails”下的嵌套数组中为每个“操作”添加一个扁平条目

I have been able to flatten the structures, but then grouping by (and duplicating the other columns for each action) becomes convoluted. 我已经能够弄平结构,但是然后进行分组(并为每个操作复制其他列)变得很复杂。 Grouping by the "action"s before flattening has not worked for me because they're nested. 在拼合前按“动作”进行分组对我来说不起作用,因为它们是嵌套的。

An example of how the first entry in the original JSON should look afterwards is: 以下是原始JSON中的第一个条目应如何显示的示例:

[
  {
    "timestamp": 84615195,
    "interactionPosition": "1",
    "type": "action",
    "Value1": "0",
    "Conversions": "0",
    "Revenue": "0.00",
    "pluginName1": "pdf",
    "pluginIcon1": "pdf",
    "pluginName2": "java",
    "pluginIcon2": "java",
    "plugins": "pdf, java",
    "Gender": "F",
    "Role": "Person",
    "Partner": "Partner1",
    "interactions": "7",
    "actions": "3",
    "operatingSystem": "Windows 10"
  },
  {
    "timestamp": 84615145,
    "interactionPosition": "2",
    "type": "action",
    "Value1": "0",
    "Conversions": "0",
    "Revenue": "0.00",
    "pluginName1": "pdf",
    "pluginIcon1": "pdf",
    "pluginName2": "java",
    "pluginIcon2": "java",
    "plugins": "pdf, java",
    "Gender": "F",
    "Role": "Person",
    "Partner": "Partner1",
    "interactions": "7",
    "actions": "3",
    "operatingSystem": "Windows 10"
  },
  {
    "timestamp": 84615693,
    "interactionPosition": "3",
    "type": "action",
    "Value1": "0",
    "Conversions": "0",
    "Revenue": "0.00",
    "pluginName1": "pdf",
    "pluginIcon1": "pdf",
    "pluginName2": "java",
    "pluginIcon2": "java",
    "plugins": "pdf, java",
    "Gender": "F",
    "Role": "Person",
    "Partner": "Partner1",
    "interactions": "7",
    "actions": "3",
    "operatingSystem": "Windows 10",
    "name1": "value1",
    "name2": "value2"
   }
]

You may note in the above that some of the flattened key names have been replaced by an associated value (inside the same nested structure). 您可能在上面注意到,一些扁平化的键名已被关联的值替换(在同一嵌套结构内)。 This isn't completely necessary, but it would be a nice bonus. 这不是完全必要的,但这将是一个不错的奖励。 Also worth noting: My JSON is large (800MB), and I would like to do that, but I suppose that point would be best posed in another question. 同样值得注意的是:我的JSON非常大(800MB),我想这样做,但是我想最好在另一个问题中提出这一点。

Thanks in advance for any assistance or advice! 预先感谢您的任何帮助或建议!

The following answer does not deal with every requirement you've mentioned but it will hopefully get you over the main hurdle you've evidently been facing. 以下答案不能解决您提到的所有要求,但希望可以使您克服显然已经面临的主要障碍。

Since your requirements regarding the "customVariables" are not clear to me, I'll ignore .customVariables completely, and hopefully you'll also be able to handle .pluginsIcons yourself once you're over the main hurdle. 由于您对“ customVariables”的要求对我来说还不清楚,因此我将完全忽略.customVariables,希望您一旦遇到主要障碍也可以自己处理.pluginsIcons。 So for clarity, I'll simply delete those keys. 因此,为清楚起见,我将删除这些键。

As I understand it, you want some grouping to take place after the flattening based on .actionDetails. 据我了解,您希望在基于.actionDetails的拼合后进行一些分组。 These requirements are also unclear to me so let's focus on the flattening: 这些要求对我来说也不是很清楚,所以让我们专注于展平:

.[]
| .actionDetails[] + (del(.actionDetails) | del(.customVariables) | del(.pluginsIcons))

This produces a stream of JSON objects, the first two of which are: 这将生成JSON对象流,其中的前两个是:

{
  "timestamp": 84615195,
  "interactionPosition": "1",
  "type": "action",
  "Value1": "0",
  "Conversions": "0",
  "Revenue": "0.00",
  "serverTimestamp": 84615198,
  "plugins": "pdf, java",
  "interactions": "7",
  "actions": "3",
  "operatingSystem": "Windows 10"
}
{
  "timestamp": 84615145,
  "interactionPosition": "2",
  "type": "action",
  "Value1": "0",
  "Conversions": "0",
  "Revenue": "0.00",
  "serverTimestamp": 84615198,
  "plugins": "pdf, java",
  "interactions": "7",
  "actions": "3",
  "operatingSystem": "Windows 10"
}

This is very similar to the expected output you've shown, so hopefully you can take it from here. 这与您显示的预期输出非常相似,因此希望您可以从此处获取。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM