简体   繁体   English

使用 bash、sed、grep 或 awk 从无效的 JSON 中提取数据?

[英]Extract data from invalid JSON using bash, sed, grep or awk?

I am trying to parse invalid JSON in bash我正在尝试在 bash 中解析无效的 JSON

x="{componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVi, componentName: Versions, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVj, componentName: Approves, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVe, componentName: activityThreads, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVf, componentName: Attachments, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVh, componentName: Details, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}"

using the following script使用以下脚本

for each in $(echo $x | sed 's/{componentId: /\n/g' ); do
    echo "Each: $each"
    echo [[ $each == 0Rb* ]]
    if [[ $each == 0Rb* ]]; then
        component=echo $each | awk -v FS="(componentName: |,|referenceName: |,)" '{print $3}'
        reference=echo $each | awk -v FS="(componentName: |,|referenceName: |,)" '{print $6}'
        echo "component: $component"
        echo "reference: $component"
    fi
done

but it doesn't work.但它不起作用。 I don't understand why it doesn't work.我不明白为什么它不起作用。 When I execute this line in console,当我在控制台中执行这一行时,

echo $x | sed 's/{componentId: /\n/g' 

I can see that this invalid json is split into lines correctly, but when I try to pass this into for-loop, each variable receives smaller chunks into it value我可以看到这个无效的 json 被正确地分成几行,但是当我尝试将它传递给 for-loop 时,每个变量都会将更小的块接收到它的值中

Each: 00N5E000005vm9e,

I am confused.我很迷惑。

What I am trying to do is to extract the value between componentName: and , and another value between referenceName: and , for each item from the invalid json when componentId doesn't start with 00N .我想要做的是提取componentName:和之间的值,以及referenceName:和之间的另一个值,当componentId不以00N开头时,对于无效的 json 中的每个项目。 Is there a way to achieve this?有没有办法做到这一点?

I have also tried to use jq -n $x but it fails with jq: error: syntax error, unexpected IDENT, expecting '}' (Unix shell quoting issues?) at <top-level>, line 1:我也尝试过使用jq -n $x但它失败了jq: error: syntax error, unexpected IDENT, expecting '}' (Unix shell quoting issues?) at <top-level>, line 1:

Treat the data as JSON将数据视为 JSON

Convert it back to valid json with sed , eg:使用 sed 将其转换回有效的sed ,例如:

# Remove redundant space (assuming the text is in the `x` variable)
<<<"$x" sed 's/: /:/g; s/, /,/g' |

# Quote all "words"
sed -E 's/[^"{}:,]+/"&"/g'       |

# Separate objects
sed 's/},{/}\n{/g'               |

# Parse json
jq .

Output: Output:

{
  "componentId": "00N5E000005vm9e",
  "componentName": "Field",
  "referenceId": "0M05E0000002XbV",
  "referenceName": "RecordPageName1",
  "referenceUrl": "null",
  "message": "Component is in use by another component in your organization.",
  "reasonCode": "10"
}
{
  "componentId": "00N5E000005vm9e",
  "componentName": "Field",
  "referenceId": "0M05E0000002XbV",
  "referenceName": "RecordPageName1",
  "referenceUrl": "null",
  "message": "Component is in use by another component in your organization.",
  "reasonCode": "10"
}
{
  "componentId": "00N5E000005vm9e",
  "componentName": "Field",
  "referenceId": "0M05E0000002XbV",
  "referenceName": "RecordPageName1",
  "referenceUrl": "null",
  "message": "Component is in use by another component in your organization.",
  "reasonCode": "10"
}
{
  "componentId": "0Rb5E000000BGVi",
  "componentName": "Versions",
  "referenceId": "0M05E0000002XbV",
  "referenceName": "RecordPageName1",
  "referenceUrl": "null",
  "message": "Component is in use by another component in your organization.",
  "reasonCode": "10"
}
{
  "componentId": "0Rb5E000000BGVj",
  "componentName": "Approves",
  "referenceId": "0M05E0000002XbV",
  "referenceName": "RecordPageName1",
  "referenceUrl": "null",
  "message": "Component is in use by another component in your organization.",
  "reasonCode": "10"
}
{
  "componentId": "0Rb5E000000BGVe",
  "componentName": "activityThreads",
  "referenceId": "0M05E0000002XbV",
  "referenceName": "RecordPageName1",
  "referenceUrl": "null",
  "message": "Component is in use by another component in your organization.",
  "reasonCode": "10"
}
{
  "componentId": "0Rb5E000000BGVf",
  "componentName": "Attachments",
  "referenceId": "0M05E0000002XbV",
  "referenceName": "RecordPageName1",
  "referenceUrl": "null",
  "message": "Component is in use by another component in your organization.",
  "reasonCode": "10"
}
{
  "componentId": "0Rb5E000000BGVh",
  "componentName": "Details",
  "referenceId": "0M05E0000002XbV",
  "referenceName": "RecordPageName1",
  "referenceUrl": "null",
  "message": "Component is in use by another component in your organization.",
  "reasonCode": "10"
}

To iterate over componentId and referenceId you could use the jq's @tsv formatting operator, eg:要遍历componentIdreferenceId ,您可以使用 jq 的@tsv格式运算符,例如:

... | jq -r '[ .componentId, .referenceId ] | @tsv'

Output: Output:

00N5E000005vm9e 0M05E0000002XbV
00N5E000005vm9e 0M05E0000002XbV
00N5E000005vm9e 0M05E0000002XbV
0Rb5E000000BGVi 0M05E0000002XbV
0Rb5E000000BGVj 0M05E0000002XbV
0Rb5E000000BGVe 0M05E0000002XbV
0Rb5E000000BGVf 0M05E0000002XbV
0Rb5E000000BGVh 0M05E0000002XbV

Treat the data as YAML将数据视为 YAML

As noted by @léa, you can use yq to parse this string as a YAML array.正如@léa 所指出的,您可以使用yq将此字符串解析为 YAML 数组。 Here is my take on that approach using version 4.13.2 of Mike Farah's yq :这是我使用Mike Farah 的 yq 4.13.2 版对这种方法的看法:

<<<"[$x]" yq e '.[] | .componentId + " " + .referenceId' -

Output: Output:

00N5E000005vm9e 0M05E0000002XbV
00N5E000005vm9e 0M05E0000002XbV
00N5E000005vm9e 0M05E0000002XbV
0Rb5E000000BGVi 0M05E0000002XbV
0Rb5E000000BGVj 0M05E0000002XbV
0Rb5E000000BGVe 0M05E0000002XbV
0Rb5E000000BGVf 0M05E0000002XbV
0Rb5E000000BGVh 0M05E0000002XbV

Parse the variables in a bash loop在 bash 循环中解析变量

You can pipe the result from the above solutions to a while read loop, eg:您可以将上述解决方案的结果 pipe 用于while read循环,例如:

... | while read componentId referenceId; do 
  : Do your processing here with $componentId and $referenceId
done

This input string is part of a YAML objects array container.此输入字符串是 YAML 对象数组容器的一部分。 So parse it with a YAML parser.所以用 YAML 解析器解析它。

With Python:使用 Python:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import sys
import yaml
import json

# Your input invalid JSON but valid YAML elements part of an array
x = "{componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVi, componentName: Versions, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVj, componentName: Approves, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVe, componentName: activityThreads, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVf, componentName: Attachments, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVh, componentName: Details, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}"

# Compose yamlstring from x by adding the missing data array container
yamlstring = "data: [" + x + "]"

# Load data from the yamlstring
data = yaml.load(yamlstring, yaml.SafeLoader)

# Output data as JSON
json.dump(data, sys.stdout, indent=2)

Or from a shell using yq as parser:或者从 shell 使用yq作为解析器:

#!/usr/bin/env sh

x="{componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVi, componentName: Versions, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVj, componentName: Approves, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVe, componentName: activityThreads, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVf, componentName: Attachments, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVh, componentName: Details, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}"

yamlstring="data: [$x]"

printf %s "$yamlstring" | yq -I 4 -o json e '.' -

Thanks for comments, looks like I have figured this out.感谢您的评论,看来我已经弄清楚了。

echo $x | sed 's/{componentId: /\n/g' | while IFS=\n read -r each; do
    #echo "Each: $each"
    #echo [[ $each == 0Rb* ]]
    if [[ $each == 0Rb* ]]; then
        component=$(echo $each | awk -v FS="(componentName: |,|referenceName: |,)" '{print $3}')
        reference=$(echo $each | awk -v FS="(componentName: |,|referenceName: |,)" '{print $6}')
        echo "component: $component"
        echo "reference: $reference"
    fi
done

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM