[英]how to print 3rd field in 3rd column itself
In my file I have 3 fields, I want to print only the third field in the third column only but output is getting to the first row. 在我的文件中,我有3个字段,我只想只在第三列中打印第三个字段,但是输出到达了第一行。 Please check my file and output: 请检查我的文件并输出:
1st field 2nd field 3rd field
--------- --------- -----------
a,b,c,d d,e,f,g,h 1,2,3,4,5,5
q,w,e,r t,y,g,t,i 9,8,7,6,5,5
I'm using the following command to print the third field only in the third column 我正在使用以下命令仅在第三列中打印第三字段
cat filename |awk '{print $3}' |tr ',' '\n'
OUTPUT printing 3rd field strings in the 1st field place, i want that to print in only 3rd field area only 输出在第一个字段中打印第三个字段字符串,我希望仅在第三个字段中打印
first field :-
---------------
1
2
3
4
5
5
expected output 预期产量
1st field 2nd field 3rd field
--------- --------- -----------
a,b,c,d d,e,f,g,h 1
2
3
4
5
5
q,w,e,r t,y,g,t,i 9
8
7
6
5
5
Input 输入项
[akshay@localhost tmp]$ cat file
1st field 2nd field 3rd field
--------- --------- -----------
a,b,c,d d,e,f,g,h 1,2,3,4,5,5
q,w,e,r t,y,g,t,i 9,8,7,6,5,5
Script 脚本
[akshay@localhost tmp]$ cat test.awk
NR<3 || !NF{ print; next}
{
split($0,D,/[^[:space:]]*/)
c1=sprintf("%*s",length($1),"")
c2=sprintf("%*s",length($2),"")
split($3,A,/,/)
for(i=1; i in A; i++)
{
if(i==2)
{
$1 = c1
$2 = c2
}
printf("%s%s%s%s%d\n",$1,D[2],$2,D[3],A[i])
}
}
Output 输出量
[akshay@localhost tmp]$ awk -f test.awk file
1st field 2nd field 3rd field
--------- --------- -----------
a,b,c,d d,e,f,g,h 1
2
3
4
5
5
q,w,e,r t,y,g,t,i 9
8
7
6
5
5
Explanation 说明
NR<3 || !NF{ print; next}
NR gives you the total number of records being processed or line number, in short NR variable has line number. NR提供了要处理的记录总数或行号,总之NR变量具有行号。
NF gives you the total number of fields in a record. NF为您提供记录中字段的总数。
The next statement forces
awk
to immediately stop processing the current record and go on to the next record. 下awk
语句强制awk
立即停止处理当前记录并继续下一条记录。
If line number is less than 3 or not NF (meaning no fields in record that is blank line), print current record and go to next record. 如果行号小于3或不小于NF(意味着记录中没有字段为空行),则打印当前记录并转到下一个记录。
split($0,D,/[^[:space:]]*/)
Since we are interested to preserve the formatting, so we are saving separators between fields on array D
here, if you have GNU awk you can make use of 4th arg for split()
- it lets you split the line into 2 arrays, one of the fields and the other of the separators between the fields and then you can just operate on the fields array and print using the separators array between each field array element to rebuild the original $0
. 由于我们有兴趣保留格式,因此我们在这里保存数组D
字段之间的分隔符,如果您有GNU awk ,则可以对split()
使用第四个arg-它可以将行拆分为2个数组,其中之一字段和字段之间的另一个分隔符,然后您就可以对字段数组进行操作并使用每个字段数组元素之间的分隔符数组进行打印以重建原始$0
。
c1=sprintf("%*s",length($1),"")
and c2=sprintf("%*s",length($2),"")
c1=sprintf("%*s",length($1),"")
和c2=sprintf("%*s",length($2),"")
Here sprintf
function is used to fill space char of field ( $1 or $2
) length. 这里的sprintf
函数用于填充字段char( $1 or $2
)的长度。
split($3,A,/,/)
split(string, array [, fieldsep [, seps ] ]) split(字符串,数组[,fieldsep [,sep]])
Divide string into pieces separated by fieldsep and store the pieces in array and the separator strings in the seps array. 将字符串划分为由fieldsep分隔的片段,并将片段存储在数组中,并将分隔符字符串存储在seps数组中。 The first piece is stored in array[1], the second piece in array[2], and so forth. 第一块存储在array [1]中,第二块存储在array [2]中,依此类推。 The string value of the third argument, fieldsep, is a regexp describing where to split string (much as FS can be a regexp describing where to split input records). 第三个参数的字符串值fieldsep是一个正则表达式,描述了在哪里拆分字符串(就像FS可以是一个正则表达式,描述了在哪里拆分输入记录)。 If fieldsep is omitted, the value of FS is used. 如果省略fieldsep,则使用FS的值。 split() returns the number of elements created. split()返回创建的元素数。
Loop till as long as i in A
is true, I just came to know that i=1
and i++
control the order of traversal of the array, Thanks to Ed Morton 循环直到i in A
为真,我才知道i=1
和i++
控制数组的遍历顺序,这要感谢Ed Morton
if(i==2) { $1 = c1 $2 = c2 }
when i = 1
we print a,b,c,d
and d,e,f,g,h
, in next iteration we modify $1
and $2
value with c1
and c2
we created above since you are interested to show only once as requested. 当i = 1
我们打印a,b,c,d
和d,e,f,g,h
,在下一次迭代中,我们用上面创建的c1
和c2
修改$1
和$2
值,因为您有兴趣只显示一次。
printf("%s%s%s%s%d\\n",$1,D[2],$2,D[3],A[i])
Finally print field1 ( $1
), separator between field1 and field2 to we saved above, that is D[2]
, field2 ( $2
), separator between field2 and field3 and array A
element only by one which we created from ( split($3,A,/,/)
). 最后打印field1( $1
),我们上面保存的field1和field2之间的分隔符,即D[2]
,field2( $2
),field2和field3之间的分隔符,以及数组A
元素仅由我们从( split($3,A,/,/)
)。
$ cat tst.awk
NR<3 || !NF { print; next }
{
front = gensub(/((\S+\s+){2}).*/,"\\1","")
split($3,a,/,/)
for (i=1;i in a;i++) {
print front a[i]
gsub(/\S/," ",front)
}
}
$ awk -f tst.awk file
1st field 2nd field 3rd field
--------- --------- -----------
a,b,c,d d,e,f,g,h 1
2
3
4
5
5
q,w,e,r t,y,g,t,i 9
8
7
6
5
5
The above uses GNU awk for gensub(), with other awks use match()+substr(). 上面的代码对gensub()使用GNU awk,而其他awks使用match()+ substr()。 It also uses \\S
and \\s
shorthand for [^[:space:]]
and [[:space:]]
. 它还为[^[:space:]]
和[[:space:]]
使用\\S
和\\s
简写。
Considering the columns are tab separated, I would say: 考虑到列是制表符分隔的,我会说:
awk 'BEGIN{FS=OFS="\t"}
NR<=2 || !NF {print; next}
NR>2{n=split($3,a,",")
for (i=1;i<=n; i++)
print (i==1?$1 OFS $2:"" OFS ""), a[i]
}' file
$ awk 'BEGIN{FS=OFS="\t"} NR<=2 || !NF {print; next} NR>2{n=split($3,a,","); for (i=1;i<=n; i++) print (i==1?$1 OFS $2:"" OFS ""), a[i]}' a
1st field 2nd field 3rd field
--------- --------- -----------
a,b,c,d d,e,f,g,h 1
2
3
4
5
5
q,w,e,r t,y,g,t,i 9
8
7
6
5
5
Note the output is a bit ugly, since tab separating the columns lead them like this. 请注意,输出有点难看,因为用制表符分隔各列会像这样引导它们。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.