Extract part of one column and save into another file using awk

Question

I have a requirement to extract fields from a csv file. There are two columns billing_info and key_id . billing_info is a object which has multiple data items in curly braces. I need to extract billing_info.id_encrypted , key_id into a different file.

input.csv

  billing_info,key_id
    {id: '1B82', id_encrypted: '1Q4AW5bwyU', address: 'san jose', phone: '13423', country: 'v73jyqgE='},bf6-96f751

output.csv

 billing_info.id_encrypted,key_id
 1Q4AW5bwyU,bf6-96f751

May i know how to use awk command to extract the data in format mentioned in output.csv. Please help

Answer 1

Making some assumptions:

the first line of input lists the column names
the brace-delimited element contains an arbitrary number of comma-separated key-value pairs
key-value pairs can appear in an arbitrary order
values are delimited by single-quotes
commas cannot appear inside keys or values
single-quotes do not appear anywhere else

<csvfile | awk -F, '
    BEGIN {
        getline
        print "billing_info.id_encrypted,key_id"
    }
    {
        for (i=1; i<NF; i++)
            if ($i ~ /id_encrypted/)
                split($i, e, /\047/)
        print e[2] "," $NF
    }
'

Notes:

-F, splits input lines into comma-separated fields
BEGIN section handles the header
- we output the header even if there is no input
for loop runs through all the fields (except the final one)
($i ~ /id_encrypted/) looks for any that contain the key word
split splits that field on single-quotes ( /\\047/ )
print outputs the value found, and the final field

Answer 2

Here is a fast and elegant solution using awk:

awk -F ":" '{split($3,arr1,",");split($6,arr2,",");print arr1[1] "," arr2[2]}' input.csv > output.csv

With an explanation:

-F ":" make the awk field separator :

split($3,arr1,",") split the 3rd field by the , into array having 2 elements.

split($6,arr2,",") split the 6th field by the , into array having 2 elements.

Then print out the first element in arr1 and the second element in arr2 .

Answer 3

I recommend you just convert your whole input to CSV and THEN you can trivially extract whatever fields you like from it using awk or Excel or any other tool, eg:

$ cat tst.awk
BEGIN { FS=OFS="," }
FNR==1 {
    split($0,hdr)
    next
}
{
    fld[1] = fld[2] = $0
    sub(/,[^,]*$/,"",fld[1])
    gsub(/^{|}$/,"",fld[1])
    sub(/.*,/,"",fld[2])
    # print "trace: " hdr[1] "=<" fld[1] ">" | "cat>&2"
    # print "trace: " hdr[2] "=<" fld[2] ">" | "cat>&2"

    numTags = split(fld[1],tags,/'[^']*'/,vals)
    delete tags[numTags--]
    for (tagNr=1; tagNr<=numTags; tagNr++) {
        gsub(/^, *|: *$/,"",tags[tagNr])
        gsub(/^'|'$/,"",vals[tagNr])
        # print "trace:    " tagNr ": <" tags[tagNr] "=" vals[tagNr] ">" | "cat>&2"
    }
}
FNR == 2 {
    for (tagNr=1; tagNr<=numTags; tagNr++) {
        printf "%s.%s%s", hdr[1], tags[tagNr], OFS
    }
    print hdr[2]
}
{
    for (tagNr=1; tagNr<=numTags; tagNr++) {
        printf "\"%s\"%s", vals[tagNr], OFS
    }
    printf "\"%s\"%s", fld[2], ORS
}

.

$ awk -f tst.awk file
billing_info.id,billing_info.id_encrypted,billing_info.address,billing_info.phone,billing_info.country,key_id
"1B82","1Q4AW5bwyU","san jose","13423","v73jyqgE=","bf6-96f751"

The above uses GNU awk for the 4th arg to split() . Uncomment the print trace lines to see what each step is doing if you like. You don't need to add the double quotes around each output field if you remove or replace any commas within each field (esp. the address).

Extract part of one column and save into another file using awk

Question

3 answers

solution1
1 2019-03-26 02:39:30

solution2
0 2019-03-26 02:02:15

solution3
0 2019-03-26 14:25:39

Extract part of one column and save into another file using awk

Question

3 answers

solution1 1 2019-03-26 02:39:30

solution2 0 2019-03-26 02:02:15

solution3 0 2019-03-26 14:25:39

solution1
1 2019-03-26 02:39:30

solution2
0 2019-03-26 02:02:15

solution3
0 2019-03-26 14:25:39