[英]having trouble converting csv to json using bash jq
我有一个流输出存储在 csv 文件中,我需要帮助将 csv 转换为 json:
我的 csv 看起来像:
cat output.csv
"k","a1",1,"b1","c1","d1",1
"l","a2",2,"b2","c2","d2",2
"m","a3",3,"b3","c3","d3",3
"n","a4",4,"b4","c4","d4",4
"o","a5",5,"b5","c5","d5",5
所需输出:
注意:我需要将密钥configuration
添加到 json 中。
{
"configuration": {
"k": {
"a": "a1",
"number1": "1",
"c": "b1",
"d": "c1",
"e": "d1",
"number2": "1"
},
"l": {
"a": "a2",
"number1": "2",
"c": "b2",
"d": "c2",
"e": "d2",
"number2": "2"
},
.
.
.
}
}
到目前为止尝试使用 jq:
我的功能是:
cat api.jq
[
inputs |
split(",") |
map(ltrimstr("\"")) |
map(rtrimstr("\"")) |
{
a: .[1],
number1: .[2],
c: .[3],
d: .[4],
e: .[5],
number2: .[6]
}
] | {configuration: .}
输出:
jq -nRf api.jq output.csv
{
"cluster_configuration": [
{
"a": "a1",
"number1": "1",
"c": "b1",
"d": "c1",
"e": "d1",
"number2": "1"
},
{
"a": "a2",
"number1": "2",
"c": "b2",
"d": "c2",
"e": "d2",
"number2": "2"
},
{
"a": "a3",
"number1": "3",
"c": "b3",
"d": "c3",
"e": "d3",
"number2": "3"
},
{
"a": "a4",
"number1": "4",
"c": "b4",
"d": "c4",
"e": "d4",
"number2": "4"
},
{
"a": "a5",
"number1": "5",
"c": "b5",
"d": "c5",
"e": "d5",
"number2": "5"
}
]
}
只要你的目标是成为a
关键,使用from_entries
是适合的:
[
inputs |
split(",") |
map(ltrimstr("\"")) |
map(rtrimstr("\"")) |
{
"key": .[1],
"value": {
number: .[2],
c: .[3],
d: .[4],
e: .[5],
number: .[6]
}
}
] |
from_entries |
{ configuration: . }
运行时
jq -R -f api.jq <output.csv
...输出是:
{
"configuration": {
"a2": {
"number": "2",
"c": "b2",
"d": "c2",
"e": "d2"
},
"a3": {
"number": "3",
"c": "b3",
"d": "c3",
"e": "d3"
},
"a4": {
"number": "4",
"c": "b4",
"d": "c4",
"e": "d4"
},
"a5": {
"number": "5",
"c": "b5",
"d": "c5",
"e": "d5"
}
}
}
如果 CSV 解析的稳健性是一个问题,您可以轻松地调整rosettacode.org上的解析器。 以下将 CSV 行转换为 JSON 数组; 由于下面的“主”程序使用inputs
,您将使用 -R 和 -n 命令行选项。
## The PEG * operator:
def star(E): (E | star(E)) // . ;
## Helper functions:
# Consume a regular expression rooted at the start of .remainder, or emit empty;
# on success, update .remainder and set .match but do NOT update .result
def consume($re):
# on failure, match yields empty
(.remainder | match("^" + $re)) as $match
| .remainder |= .[$match.length :]
| .match = $match.string;
def parse($re):
consume($re)
| .result = .result + [.match] ;
def ws: consume(" *");
### Parse a string into comma-separated values
def quoted_field_content:
parse("((\"\")|([^\"]))*")
| .result[-1] |= gsub("\"\""; "\"");
def unquoted_field: parse("[^,\"]*");
def quoted_field: consume("\"") | quoted_field_content | consume("\"");
def field: (ws | quoted_field | ws) // unquoted_field;
def record: field | star(consume(",") | field);
def csv2array:
{remainder: .} | record | .result;
inputs | csv2array
我知道您将这个问题作为bash+jq
问题提出,但是,如果它是bash+python
问题,解决方案将是微不足道的:
# csv2json.py
import sys, csv, json
data = { "configuration": { } }
for [k,a,n1,c,d,e,n2] in csv.reader(sys.stdin.readlines()):
data["configuration"][k] = { "a": a, "number1": n1, "c": c, "d": d, "e": e, "number2": n2 }
print(json.dumps(data, indent=2))
然后,在 bash 中(我假设这里是 Ubuntu),我们可以:
python3 csv2json.py < output.csv
这是Miller的可能解决方案(可在此处用于多个操作系统),这是一个支持多种输入/输出格式的有趣工具:
mlr --icsv -N put -q '
@map[$1] = {"a": $2, "number1": $3, "c": $4, "d": $5, "e": $6, "number2": $7}
end { dump { "configuration": @map } }
' file.csv
{
"configuration": {
"k": {
"a": "a1",
"number1": 1,
"c": "b1",
"d": "c1",
"e": "d1",
"number2": 1
},
"l": {
...
注意:要强制将数字视为字符串,您可以使用--infer-none
选项。
你需要一个像 miller 这样的好工具,首先将你的 csv 转换为 json。 那么使用 jq 就更容易了。
cat <(echo k,a,number1,c,d,e,number2) output.csv > output_with_header.csv
所以现在你有一个更好的 csv。
mlr --icsv --ojson cat output_with_header.csv > output.json
jq '{configuration: ([.[]|{key: .k,value: (.|del(.k))}]|from_entries)}' output.json
那给你:
{
"configuration": {
"k": {
"a": "a1",
"number1": 1,
"c": "b1",
"d": "c1",
"e": "d1",
"number2": 1
},
"l": {
"a": "a2",
"number1": 2,
"c": "b2",
"d": "c2",
"e": "d2",
"number2": 2
},
"m": {
"a": "a3",
"number1": 3,
"c": "b3",
"d": "c3",
"e": "d3",
"number2": 3
},
"n": {
"a": "a4",
"number1": 4,
"c": "b4",
"d": "c4",
"e": "d4",
"number2": 4
},
"o": {
"a": "a5",
"number1": 5,
"c": "b5",
"d": "c5",
"e": "d5",
"number2": 5
}
}
}
一起作为一个班轮:
cat <(echo k,a,number1,c,d,e,number2) output.csv | mlr --icsv --ojson cat | jq '{configuration: ([.[]|{key: .k,value: (.|del(.k))}]|from_entries)}'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.