简体   繁体   中英

Is there a simple way to convert a CSV with 0-indexed paths as keys to JSON with Miller?

Consider the following CSV:

email/1,email/2
abc@xyz.org,bob@pass.com

You can easily convert it to JSON (taking into account the paths defined by the keys) with Miller :

mlr --icsv --ojson --jflatsep '/' cat file.csv
[ { "email": ["abc@xyz.org", "bob@pass.com"] } ]

Now, if the paths are 0-indexed in the CSV (which is surely more common):

email/0,email/1
abc@xyz.org,bob@pass.com

Then, without prior knowledge of the fields names , it seams that you'll have to rewrite the whole conversion:

edit: replaced the hard-coded / with FLATSEP builtin variable:

mlr --icsv --flatsep '/' put -q '
    begin { @labels = []; print "[" }

    # translate the original CSV header from 0-indexed to 1-indexed
    NR == 1 {
        i = 1;
        for (k in $*) {
            @labels[i] = joinv( apply( splita(k,FLATSEP), func(e) {
                return typeof(e) == "int" ? e+1 : e
            }), FLATSEP );
            i += 1;
        }
    }

    NR > 1 { print @object, "," }

    # create an object from the translated labels and the row values
    o = {};
    i = 1;
    for (k,v in $*) {
        o[@labels[i]] = v;
        i += 1;
    }
    @object = arrayify( unflatten(o,FLATSEP) );

    end { if (NR > 0) { print @object } print "]" }
' file.csv

I would like to know if I'm missing something obvious, like a command line option or a way to rename the fields with the put verb, or maybe something else? You're also welcome to give your insights about the previous code, as I'm not really confident in my Miller's programming skills.


Update:

With @aborruso approach of pre-processing the CSV header, this could be reduced to:
note: I didn't keep the regextract part because it means knowing the CSV header in advance.

mlr --csv -N --flatsep '/' put '
    NR == 1 {
        for (i,k in $*) {
            $[i] = joinv( apply( splita(k,FLATSEP), func(e) {
                return typeof(e) == "int" ? e+1 : e
            }), FLATSEP );
        }
    }
' file.csv |
mlr --icsv --flatsep '/' --ojson cat

Even if there are workarounds like using the rename verb (when you know the header in advance) or pre-processing the CSV header, I still hope that Miller's author could add an extra command-line option that would deal with this kind of 0‑indexed external data; adding a DSL function like arrayify0 (and flatten0 ) could also prove useful in some cases.

I would like to know if I'm missing something obvious, like a command line option or a way to rename the fields with put verb, or maybe something else?

Starting from this

email/0,email/1
abc@xyz.org,bob@pass.com

you can use implicit CSV header and run

mlr --csv -N put 'if (NR == 1) {for (k in $*) {$[k] = "email/".string(int(regextract($[k],"[0-9]+"))+1)}}' input.csv

to have

email/1,email/2
abc@xyz.org,bob@pass.com

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM