简体   繁体   English

有没有一种简单的方法可以使用 Miller 将具有 0 索引路径作为键的 CSV 转换为 JSON?

[英]Is there a simple way to convert a CSV with 0-indexed paths as keys to JSON with Miller?

Consider the following CSV:考虑以下 CSV:

email/1,email/2
abc@xyz.org,bob@pass.com

You can easily convert it to JSON (taking into account the paths defined by the keys) with Miller :您可以使用Miller轻松地将其转换为 JSON(考虑到键定义的路径):

mlr --icsv --ojson --jflatsep '/' cat file.csv
[ { "email": ["abc@xyz.org", "bob@pass.com"] } ]

Now, if the paths are 0-indexed in the CSV (which is surely more common):现在,如果路径在 CSV 中是 0 索引的(这肯定更常见):

email/0,email/1
abc@xyz.org,bob@pass.com

Then, without prior knowledge of the fields names , it seams that you'll have to rewrite the whole conversion:然后,在事先不了解字段名称的情况下,您似乎必须重写整个转换:

edit: replaced the hard-coded / with FLATSEP builtin variable:编辑:将硬编码/替换为FLATSEP内置变量:

mlr --icsv --flatsep '/' put -q '
    begin { @labels = []; print "[" }

    # translate the original CSV header from 0-indexed to 1-indexed
    NR == 1 {
        i = 1;
        for (k in $*) {
            @labels[i] = joinv( apply( splita(k,FLATSEP), func(e) {
                return typeof(e) == "int" ? e+1 : e
            }), FLATSEP );
            i += 1;
        }
    }

    NR > 1 { print @object, "," }

    # create an object from the translated labels and the row values
    o = {};
    i = 1;
    for (k,v in $*) {
        o[@labels[i]] = v;
        i += 1;
    }
    @object = arrayify( unflatten(o,FLATSEP) );

    end { if (NR > 0) { print @object } print "]" }
' file.csv

I would like to know if I'm missing something obvious, like a command line option or a way to rename the fields with the put verb, or maybe something else?我想知道我是否遗漏了一些明显的东西,例如命令行选项或使用put动词重命名字段的方法,或者其他东西? You're also welcome to give your insights about the previous code, as I'm not really confident in my Miller's programming skills.也欢迎您对以前的代码提出您的见解,因为我对我的 Miller 的编程技能不太有信心。


Update:更新:

With @aborruso approach of pre-processing the CSV header, this could be reduced to:使用@aborruso 预处理 CSV header 的方法,这可以简化为:
note: I didn't keep the regextract part because it means knowing the CSV header in advance.注意:我没有保留正则regextract部分,因为这意味着提前知道 CSV header。

mlr --csv -N --flatsep '/' put '
    NR == 1 {
        for (i,k in $*) {
            $[i] = joinv( apply( splita(k,FLATSEP), func(e) {
                return typeof(e) == "int" ? e+1 : e
            }), FLATSEP );
        }
    }
' file.csv |
mlr --icsv --flatsep '/' --ojson cat

Even if there are workarounds like using the rename verb (when you know the header in advance) or pre-processing the CSV header, I still hope that Miller's author could add an extra command-line option that would deal with this kind of 0‑indexed external data;即使有使用rename动词(当你提前知道 header 时)或预处理 CSV header 等变通方法,我仍然希望 Miller 的作者可以添加一个额外的命令行选项来处理这种0‑索引外部数据; adding a DSL function like arrayify0 (and flatten0 ) could also prove useful in some cases.添加像arrayify0 (和flatten0 )这样的DSL function 在某些情况下也很有用。

I would like to know if I'm missing something obvious, like a command line option or a way to rename the fields with put verb, or maybe something else?我想知道我是否遗漏了一些明显的东西,比如命令行选项或用 put 动词重命名字段的方法,或者其他东西?

Starting from this从此开始

email/0,email/1
abc@xyz.org,bob@pass.com

you can use implicit CSV header and run您可以使用隐式 CSV header 并运行

mlr --csv -N put 'if (NR == 1) {for (k in $*) {$[k] = "email/".string(int(regextract($[k],"[0-9]+"))+1)}}' input.csv

to have具有

email/1,email/2
abc@xyz.org,bob@pass.com

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM