简体   繁体   English

使用jq工具解析Json文件

[英]Parsing Json file with jq tool

I Have the following nested json file which I want to parse with jq tool and print in table form like I show at the end 我有以下嵌套的json文件,我想用jq工具进行解析,并以表格形式打印,就像我最后显示的那样

The input.json structure is like this: input.json结构如下:

{
 "document":{
  "page":[
     {
        "@index":"0",
        "image":{
           "@data":"ABC",
           "@format":"png",
           "@height":"620.00",
           "@type":"base64encoded",
           "@width":"450.00",
           "@x":"85.00",
           "@y":"85.00"
        }
     },
     {
        "@index":"1",
        "row":[
           {
              "column":[
                 {
                    "text":""
                 },
                 {
                    "text":{
                       "#text":"Text1",
                       "@fontName":"Arial",
                       "@fontSize":"12.0",
                       "@height":"12.00",
                       "@width":"71.04",
                       "@x":"121.10",
                       "@y":"83.42"
                    }
                 }
              ]
           },
           {
              "column":[
                 {
                    "text":""
                 },
                 {
                    "text":{
                       "#text":"Text2",
                       "@fontName":"Arial",
                       "@fontSize":"12.0",
                       "@height":"12.00",
                       "@width":"101.07",
                       "@x":"121.10",
                       "@y":"124.82"
                    }
                 }
              ]
           }
        ]
     },
     {
        "@index":"2",
        "row":[
           {
              "column":{
                 "text":{
                    "#text":"Text3",
                    "@fontName":"Arial",
                    "@fontSize":"12.0",
                    "@height":"12.00",
                    "@width":"363.44",
                    "@x":"85.10",
                    "@y":"69.62"
                 }
              }
           },
           {
              "column":{
                 "text":{
                    "#text":"Text4",
                    "@fontName":"Arial",
                    "@fontSize":"12.0",
                    "@height":"12.00",
                    "@width":"382.36",
                    "@x":"85.10",
                    "@y":"83.42"
                 }
              }
           },
           {
              "column":{
                 "text":{
                    "#text":"Text5",
                    "@fontName":"Arial",
                    "@fontSize":"12.0",
                    "@height":"12.00",
                    "@width":"435.05",
                    "@x":"85.10",
                    "@y":"97.22"
                 }
              }
           }
        ]
     },
     {
        "@index":"3"
     }
  ]
 }
}

Following the answers of the following question ( Parsing nested json with jq ) I've tried this code but doesn't work 按照以下问题的答案( 使用jq解析嵌套的json ),我已经尝试了此代码,但不起作用

$ cat file.json | jq .document.page[].row | ["#text", "@x", "@y"] | @csv

The output I'm trying to get is: 我想要得到的输出是:

#text @x     @y
Text1 121.10 83.42
Text2 121.10 124.82
Text3 65.10  69.62
Text4 85.10  83.42
Text5 85.10  97.22

How can achieve this? 如何实现呢?

Thanks 谢谢

UPDATE 更新

Thanks so much for the help. 非常感谢帮忙。 I've tried with a real file a bit longer. 我尝试使用真实文件的时间更长了。

I was able to adapt the first peak's solution like below: 我能够像下面这样适应第一个峰的解决方案:

["#text", "@data", "@fontName", "@fontSize", "@format", "@height", "@type", "@width", "@x", "@y"], 
( .. 
| objects 
| select(has("#text","@data")) 
| [.["#text", "@data", "@fontName", "@fontSize", "@format", "@height", "@type", "@width", "@x", "@y"]]
)  
| @tsv

and with new input I get this table: 并用新的输入得到此表:

+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| #text         | @data | @fontName | @fontSize | @format | @height | @type         | @width | @x     | @y     |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
|               | ABC   |           |           | png     | 620     | base64encoded | 450    | 85     | 85     |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text ä 1      |       | Tahoma    | 12        |         | 12      |               | 427.79 | 85.1   | 69.62  |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text ¢76      |       | Tahoma    | 12        |         | 12      |               | 270.5  | 85.1   | 690.72 |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text % 5      |       | Tahoma    | 12        |         | 12      |               | 130.84 | 358.86 | 690.72 |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text 7Ç8      |       | Tahoma    | 12        |         | 12      |               | 115.95 | 85.1   | 704.52 |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text • 2 Wñ79 |       | Tahoma    | 8         |         | 8.04    |               | 398.16 | 121.1  | 68.06  |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text          |       | Tahoma    | 12        |         | 12      |               | 101.5  | 85.1   | 83.42  |
|   » 1 A\\\\CÓ |       |           |           |         |         |               |        |        |        |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text 12       |       | Tahoma    | 12        |         | 12      |               | 312.26 | 189.83 | 83.42  |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text 82       |       | Tahoma    | 12        |         | 12      |               | 44.99  | 85.1   | 97.22  |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text 31       |       | Tahoma    | 8         |         | 8.04    |               | 381.83 | 133.1  | 95.66  |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+

If possible, how to add the follwing 3 columns (counter, page and row) to know the corresponding page and row for each line? 如果可能,如何添加以下3列(计数器,页面和行)以了解每一行对应的页面和行?

The expected output would be like this: 预期的输出将是这样的:

+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| counter | page | row | #text             | @data | @fontName | @fontSize | @format | @height | @type         | @width | @x     | @y     |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 1     | 0    |     |                   | ABC   |           |           | png     | 620     | base64encoded | 450    | 85     | 85     |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 2     | 1    | 0   | Text ä 1          |       | Tahoma    | 12        |         | 12      |               | 427.79 | 85.1   | 69.62  |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 3     | 1    | 1   | Text ¢76          |       | Tahoma    | 12        |         | 12      |               | 270.5  | 85.1   | 690.72 |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 4     | 1    | 1   | Text % 5          |       | Tahoma    | 12        |         | 12      |               | 130.84 | 358.86 | 690.72 |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 5     | 2    | 2   | Text 7Ç8          |       | Tahoma    | 12        |         | 12      |               | 115.95 | 85.1   | 704.52 |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 6     | 2    | 0   | Text • 2 Wñ79     |       | Tahoma    | 8         |         | 8.04    |               | 398.16 | 121.1  | 68.06  |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 7     | 2    | 1   | Text  » 1 A\\\\CÓ |       | Tahoma    | 12        |         | 12      |               | 101.5  | 85.1   | 83.42  |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 8     | 2    | 1   | Text 12           |       | Tahoma    | 12        |         | 12      |               | 312.26 | 189.83 | 83.42  |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 9     | 2    | 2   | Text 82           |       | Tahoma    | 12        |         | 12      |               | 44.99  | 85.1   | 97.22  |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 10    | 2    | 2   | Text 31           |       | Tahoma    | 8         |         | 8.04    |               | 381.83 | 133.1  | 95.66  |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+

This is new more representative input file input2.json . 这是新的更具代表性的输入文件input2.json

And seeing the Json structure in image below gives and idea about page number and row number present in json file and values within them. 并看到下面图像中的Json结构,给出了有关json文件中存在的pagerow号以及其中的值的想法。

在此处输入图片说明

Here's a simple (perhaps too simple?) approach that focuses on the embedded JSON objects that have a "#text" attribute: 这是一种简单的(也许太简单了?)方法,着重于具有“ #text”属性的嵌入式JSON对象:

["#text", "@x", "@y"],       # the header
( ..
  | objects
  | select(has("#text"))  
  | [.["#text", "@x", "@y"]] # a row
) 
| @csv

When given this program and the sample input, an invocation of jq using the -r option would produce: 给定该程序和示例输入后,使用-r选项调用jq将产生:

"#text","@x","@y"
"Text1","121.10","83.42"
"Text2","121.10","124.82"
"Text3","85.10","69.62"
"Text4","85.10","83.42"
"Text5","85.10","97.22"

If you don't want the quotation marks and are willing to risk that the output is not strictly CSV, then one option would be to use join(",") instead of @csv at the end of the pipeline. 如果您不希望使用引号,并且愿意冒输出严格不是CSV的风险,那么一种选择是在管道的末尾使用join(",")而不是@csv

Variants 变体

You might want to use @tsv instead of @csv . 您可能要使用@tsv而不是@csv

If a more restrictive approach to selecting the relevant embedded objects is needed, then perhaps replacing .. with .. | .text? 如果需要选择相关的嵌入对象更严格的方法,那么也许更换.... | .text? .. | .text? will suffice. 就足够了。

If not, additional filters can be added depending on the detailed requirements. 如果没有,则可以根据详细要求添加其他过滤器。

Here's a solution that uses a "drill-down" and therefore rather tedious approach: 这是一个使用“向下钻取”的解决方案,因此非常乏味:

["#text", "@x", "@y"],
( .document.page[]
  | .row[]?
  | .column
  | (if type == "array" then .[] else . end)
  | .text
  | objects
  | [.["#text", "@x", "@y"]]
)
| @tsv

This would be used in conjunction with the -r command-line option. 它将与-r命令行选项结合使用。

I've used @tsv as this produces output that resembles the given expected output. 我使用@tsv因为它产生的输出类似于给定的预期输出。 As mentioned elsewhere on this page, there are other alternatives, eg using join/1 . 如本页其他地方所述,还有其他选择,例如使用join/1

for those who are interested in the alternative solutions, here's how to achieve the same ask using a walk-path unix tool for JSON: jtc . 对于那些对替代解决方案感兴趣的人,以下是使用针对JSON的步行路径unix工具实现相同要求的方法: jtc

bash $ jtc -qq -w'<>a' -T'"#text\t@x\t@y"' -w'<@x>l:<x>v[-1][@y]<y>v[-1][#text]' -T'"{}\t{x}\t{y}"' file.json
#text   @x      @y
Text1   121.10  83.42
Text2   121.10  124.82
Text3   85.10   69.62
Text4   85.10   83.42
Text5   85.10   97.22
bash $ 

walk path ( -w ) breakdown: 步行路径( -w )故障:

  • <@x>l: <x>v find each label @x and memorize found JSON value in the namespace x <@x>l: <x>v查找每个标签@x并在命名空间x存储找到的JSON值
  • [-1][@y]<y>v address a parent (from last found value), then address JSON by label @y and memorize its value in the namespace y [-1][@y]<y>v为父节点(从最后找到的值开始)寻址,然后通过标签@y寻址JSON并将其值存储在命名空间y
  • [-1][#text] do the same for #text label (note: not memorizing the last value) [-1][#text] #text [-1][#text]#text标签执行相同的操作(注意:不记住最后一个值)

- -T'"{}\\t{x}\\t{y}"' : apply template with interpolation ( {} will interpolate the last found value, hence there was no need memorizing it in the namespace) -T'"{}\\t{x}\\t{y}"' :进行插值应用模板( {}将插值最后找到的值,因此无需将其存储在命名空间中)

- -qq will unquote the resulting JSON string (dropping quotation marks and translating \\t into tabs) - -qq解除引用所得JSON字符串(滴加引号和翻译\\t到标签)

- first walk ( -w'<>a' ) is just a dummy one to trigger template interpolation for the header line. -第一步( -w'<>a' )只是一个虚假的触发器,用于触发标题行的模板插值。

PS> Disclosure: I'm the creator of the jtc - shell cli tool for JSON operations PS>披露:我是jtc的创建者-用于JSON操作的shell cli工具

Handling input2.json 处理input2.json

Since some context-dependent information is required for the second set of requirements corresponding to input2.json, the context cannot be ignored, and so the following solution uses a "drill-down" approach. 由于对应于input2.json的第二组需求需要一些上下文相关的信息,因此不能忽略上下文,因此以下解决方案使用“向下钻取”方法。 The following will be a bit difficult to understand unless you understand foreach , so let me just mention that the approach essentially uses a state variable {counter, page, row} to keep track of the three counters. 除非您了解foreach ,否则以下内容将很难理解。因此,我只想提一下,该方法本质上使用状态变量{counter,page,row}来跟踪这三个计数器。

["counter", "page", "row", "#text", "@data", "@fontName", "@fontSize", "@format", "@height", "@type", "@width", "@x", "@y"], 
(foreach (.document.page[] | objects) as $page ({page: -1, counter: 0};
  .page += 1
  | foreach ($page | .row[]?) as $row (.row=-1;
    .row += 1
    | foreach ($row | (.column | (if type == "array" then .[] else . end )) | .text | objects) as $x (.;
      .counter += 1
      | .out = [.counter, .page, .row, $x["#text", "@data", "@fontName", "@fontSize", "@format", "@height", "@type", "@width", "@x", "@y"]]
      ; . )
      ; . )
      ; .out )
)
| @tsv

This produces the desired TSV except for the very first line of data as that has no row. 除了没有行的第一行数据,这将产生所需的TSV。 One way to include the first line is shown in my answer at Relate elements in table form from Json file with jq 包含第一行的一种方式在我的回答中显示在带有jq的Json文件的表格形式的Relate元素中

In this command: 在此命令中:

$ cat file.json | jq .document.page[].row | ["#text", "@x", "@y"] | @csv

Everything after the jq is supposed to be the first argument to jq , which means you need to enclose it in quotation marks. 后一切jq应该是第一个参数jq ,这意味着你需要把它们放在引号。 Moreover, cat file.json | 此外, cat file.json | is here a Useless Use of Cat ; 这里是无用用途 ; just pass the filename as an argument to jq . 只需将文件名作为参数传递给jq Hence, the correct command is: 因此,正确的命令是:

$ jq '.document.page[].row | ["#text", "@x", "@y"] | @csv' file.json

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM