[英]Extract part of one column and save into another file using awk
I have a requirement to extract fields from a csv file. 我需要从csv文件中提取字段。 There are two columns billing_info
and key_id
. 有两列billing_info
和key_id
。 billing_info
is a object which has multiple data items in curly braces. billing_info
是一个在花括号中包含多个数据项的对象。 I need to extract billing_info.id_encrypted
, key_id into a different file. 我需要将billing_info.id_encrypted
和key_id提取到另一个文件中。
input.csv input.csv
billing_info,key_id
{id: '1B82', id_encrypted: '1Q4AW5bwyU', address: 'san jose', phone: '13423', country: 'v73jyqgE='},bf6-96f751
output.csv output.csv
billing_info.id_encrypted,key_id
1Q4AW5bwyU,bf6-96f751
May i know how to use awk command to extract the data in format mentioned in output.csv. 我可以知道如何使用awk命令以output.csv中提到的格式提取数据。 Please help 请帮忙
Making some assumptions: 做一些假设:
<csvfile | awk -F, '
BEGIN {
getline
print "billing_info.id_encrypted,key_id"
}
{
for (i=1; i<NF; i++)
if ($i ~ /id_encrypted/)
split($i, e, /\047/)
print e[2] "," $NF
}
'
Notes: 笔记:
-F,
splits input lines into comma-separated fields -F,
将输入行拆分为逗号分隔的字段 BEGIN
section handles the header BEGIN
部分处理标题
for
loop runs through all the fields (except the final one) for
循环遍历所有字段(最后一个字段除外) ($i ~ /id_encrypted/)
looks for any that contain the key word ($i ~ /id_encrypted/)
查找包含关键字的任何内容 split
splits that field on single-quotes ( /\\047/
) split
用单引号( /\\047/
)拆分该字段 print
outputs the value found, and the final field print
输出找到的值,最后一个字段 Here is a fast and elegant solution using awk: 这是使用awk的快速而优雅的解决方案:
awk -F ":" '{split($3,arr1,",");split($6,arr2,",");print arr1[1] "," arr2[2]}' input.csv > output.csv
With an explanation: 附带说明:
-F ":"
make the awk field separator :
-F ":"
使awk字段分隔符:
split($3,arr1,",")
split the 3rd field by the ,
into array having 2 elements. split($3,arr1,",")
将第三个字段除以,
分成具有2个元素的数组。
split($6,arr2,",")
split the 6th field by the ,
into array having 2 elements. split($6,arr2,",")
由分割第六字段,
为具有2个元素的数组。
Then print out the first element in arr1
and the second element in arr2
. 然后打印出arr1
的第一个元素和arr2
的第二个元素。
I recommend you just convert your whole input to CSV and THEN you can trivially extract whatever fields you like from it using awk or Excel or any other tool, eg: 我建议您将整个输入转换为CSV,然后使用awk或Excel或任何其他工具从其中轻松提取所需的任何字段,例如:
$ cat tst.awk
BEGIN { FS=OFS="," }
FNR==1 {
split($0,hdr)
next
}
{
fld[1] = fld[2] = $0
sub(/,[^,]*$/,"",fld[1])
gsub(/^{|}$/,"",fld[1])
sub(/.*,/,"",fld[2])
# print "trace: " hdr[1] "=<" fld[1] ">" | "cat>&2"
# print "trace: " hdr[2] "=<" fld[2] ">" | "cat>&2"
numTags = split(fld[1],tags,/'[^']*'/,vals)
delete tags[numTags--]
for (tagNr=1; tagNr<=numTags; tagNr++) {
gsub(/^, *|: *$/,"",tags[tagNr])
gsub(/^'|'$/,"",vals[tagNr])
# print "trace: " tagNr ": <" tags[tagNr] "=" vals[tagNr] ">" | "cat>&2"
}
}
FNR == 2 {
for (tagNr=1; tagNr<=numTags; tagNr++) {
printf "%s.%s%s", hdr[1], tags[tagNr], OFS
}
print hdr[2]
}
{
for (tagNr=1; tagNr<=numTags; tagNr++) {
printf "\"%s\"%s", vals[tagNr], OFS
}
printf "\"%s\"%s", fld[2], ORS
}
. 。
$ awk -f tst.awk file
billing_info.id,billing_info.id_encrypted,billing_info.address,billing_info.phone,billing_info.country,key_id
"1B82","1Q4AW5bwyU","san jose","13423","v73jyqgE=","bf6-96f751"
The above uses GNU awk for the 4th arg to split()
. 上面使用GNU awk作为split()
的第四个参数。 Uncomment the print trace
lines to see what each step is doing if you like. 取消注释print trace
行,以查看每个步骤在做什么。 You don't need to add the double quotes around each output field if you remove or replace any commas within each field (esp. the address). 如果删除或替换每个字段中的逗号(尤其是地址),则无需在每个输出字段周围添加双引号。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.