I have a requirement to extract fields from a csv file. There are two columns billing_info
and key_id
. billing_info
is a object which has multiple data items in curly braces. I need to extract billing_info.id_encrypted
, key_id into a different file.
input.csv
billing_info,key_id
{id: '1B82', id_encrypted: '1Q4AW5bwyU', address: 'san jose', phone: '13423', country: 'v73jyqgE='},bf6-96f751
output.csv
billing_info.id_encrypted,key_id
1Q4AW5bwyU,bf6-96f751
May i know how to use awk command to extract the data in format mentioned in output.csv. Please help
Making some assumptions:
<csvfile | awk -F, '
BEGIN {
getline
print "billing_info.id_encrypted,key_id"
}
{
for (i=1; i<NF; i++)
if ($i ~ /id_encrypted/)
split($i, e, /\047/)
print e[2] "," $NF
}
'
Notes:
-F,
splits input lines into comma-separated fields BEGIN
section handles the header
for
loop runs through all the fields (except the final one) ($i ~ /id_encrypted/)
looks for any that contain the key word split
splits that field on single-quotes ( /\\047/
) print
outputs the value found, and the final field Here is a fast and elegant solution using awk:
awk -F ":" '{split($3,arr1,",");split($6,arr2,",");print arr1[1] "," arr2[2]}' input.csv > output.csv
With an explanation:
-F ":"
make the awk field separator :
split($3,arr1,",")
split the 3rd field by the ,
into array having 2 elements.
split($6,arr2,",")
split the 6th field by the ,
into array having 2 elements.
Then print out the first element in arr1
and the second element in arr2
.
I recommend you just convert your whole input to CSV and THEN you can trivially extract whatever fields you like from it using awk or Excel or any other tool, eg:
$ cat tst.awk
BEGIN { FS=OFS="," }
FNR==1 {
split($0,hdr)
next
}
{
fld[1] = fld[2] = $0
sub(/,[^,]*$/,"",fld[1])
gsub(/^{|}$/,"",fld[1])
sub(/.*,/,"",fld[2])
# print "trace: " hdr[1] "=<" fld[1] ">" | "cat>&2"
# print "trace: " hdr[2] "=<" fld[2] ">" | "cat>&2"
numTags = split(fld[1],tags,/'[^']*'/,vals)
delete tags[numTags--]
for (tagNr=1; tagNr<=numTags; tagNr++) {
gsub(/^, *|: *$/,"",tags[tagNr])
gsub(/^'|'$/,"",vals[tagNr])
# print "trace: " tagNr ": <" tags[tagNr] "=" vals[tagNr] ">" | "cat>&2"
}
}
FNR == 2 {
for (tagNr=1; tagNr<=numTags; tagNr++) {
printf "%s.%s%s", hdr[1], tags[tagNr], OFS
}
print hdr[2]
}
{
for (tagNr=1; tagNr<=numTags; tagNr++) {
printf "\"%s\"%s", vals[tagNr], OFS
}
printf "\"%s\"%s", fld[2], ORS
}
.
$ awk -f tst.awk file
billing_info.id,billing_info.id_encrypted,billing_info.address,billing_info.phone,billing_info.country,key_id
"1B82","1Q4AW5bwyU","san jose","13423","v73jyqgE=","bf6-96f751"
The above uses GNU awk for the 4th arg to split()
. Uncomment the print trace
lines to see what each step is doing if you like. You don't need to add the double quotes around each output field if you remove or replace any commas within each field (esp. the address).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.