简体   繁体   English

如何在第三栏本身中打印第三栏

[英]how to print 3rd field in 3rd column itself

In my file I have 3 fields, I want to print only the third field in the third column only but output is getting to the first row. 在我的文件中,我有3个字段,我只想只在第三列中打印第三个字段,但是输出到达了第一行。 Please check my file and output: 请检查我的文件并输出:

cat filename 猫文件名

1st field     2nd field    3rd field
---------     ---------    -----------
a,b,c,d       d,e,f,g,h    1,2,3,4,5,5

q,w,e,r       t,y,g,t,i    9,8,7,6,5,5

I'm using the following command to print the third field only in the third column 我正在使用以下命令仅在第三列中打印第三字段

cat filename |awk '{print $3}' |tr ',' '\n' 

OUTPUT printing 3rd field strings in the 1st field place, i want that to print in only 3rd field area only 输出在第一个字段中打印第三个字段字符串,我希望仅在第三个字段中打印

first field :-
---------------
1
2
3
4
5 
5

expected output 预期产量

1st field     2nd field    3rd field
---------     ---------    -----------
a,b,c,d       d,e,f,g,h     1
                            2
                            3
                            4
                            5 
                            5

q,w,e,r       t,y,g,t,i     9
                            8
                            7
                            6 
                            5
                            5

Input 输入项

 [akshay@localhost tmp]$ cat file
 1st field     2nd field    3rd field
 ---------     ---------    -----------
 a,b,c,d       d,e,f,g,h    1,2,3,4,5,5

 q,w,e,r       t,y,g,t,i    9,8,7,6,5,5

Script 脚本

 [akshay@localhost tmp]$ cat test.awk
    NR<3 || !NF{ print; next}
    { 
        split($0,D,/[^[:space:]]*/)
        c1=sprintf("%*s",length($1),"")
        c2=sprintf("%*s",length($2),"")
        split($3,A,/,/)
        for(i=1; i in A; i++)
        {   
            if(i==2)
            {
                $1 = c1
                $2 = c2
            }
            printf("%s%s%s%s%d\n",$1,D[2],$2,D[3],A[i]) 
        }
     }

Output 输出量

 [akshay@localhost tmp]$ awk -f test.awk file
 1st field     2nd field    3rd field
 ---------     ---------    -----------
 a,b,c,d       d,e,f,g,h    1
                            2
                            3
                            4
                            5
                            5

 q,w,e,r       t,y,g,t,i    9
                            8
                            7
                            6
                            5
                            5

Explanation 说明

  • NR<3 || !NF{ print; next}

NR gives you the total number of records being processed or line number, in short NR variable has line number. NR提供了要处理的记录总数或行号,总之NR变量具有行号。

NF gives you the total number of fields in a record. NF为您提供记录中字段的总数。

The next statement forces awk to immediately stop processing the current record and go on to the next record. awk语句强制awk立即停止处理当前记录并继续下一条记录。

If line number is less than 3 or not NF (meaning no fields in record that is blank line), print current record and go to next record. 如果行号小于3或不小于NF(意味着记录中没有字段为空行),则打印当前记录并转到下一个记录。

  • split($0,D,/[^[:space:]]*/)

Since we are interested to preserve the formatting, so we are saving separators between fields on array D here, if you have GNU awk you can make use of 4th arg for split() - it lets you split the line into 2 arrays, one of the fields and the other of the separators between the fields and then you can just operate on the fields array and print using the separators array between each field array element to rebuild the original $0 . 由于我们有兴趣保留格式,因此我们在这里保存数组D字段之间的分隔符,如果您有GNU awk ,则可以对split()使用第四个arg-它可以将行拆分为2个数组,其中之一字段和字段之间的另一个分隔符,然后您就可以对字段数组进行操作并使用每个字段数组元素之间的分隔符数组进行打印以重建原始$0

  • c1=sprintf("%*s",length($1),"") and c2=sprintf("%*s",length($2),"") c1=sprintf("%*s",length($1),"")c2=sprintf("%*s",length($2),"")

Here sprintf function is used to fill space char of field ( $1 or $2 ) length. 这里的sprintf函数用于填充字段char( $1 or $2 )的长度。

  • split($3,A,/,/)

split(string, array [, fieldsep [, seps ] ]) split(字符串,数组[,fieldsep [,sep]])

Divide string into pieces separated by fieldsep and store the pieces in array and the separator strings in the seps array. 将字符串划分为由fieldsep分隔的片段,并将片段存储在数组中,并将分隔符字符串存储在seps数组中。 The first piece is stored in array[1], the second piece in array[2], and so forth. 第一块存储在array [1]中,第二块存储在array [2]中,依此类推。 The string value of the third argument, fieldsep, is a regexp describing where to split string (much as FS can be a regexp describing where to split input records). 第三个参数的字符串值fieldsep是一个正则表达式,描述了在哪里拆分字符串(就像FS可以是一个正则表达式,描述了在哪里拆分输入记录)。 If fieldsep is omitted, the value of FS is used. 如果省略fieldsep,则使用FS的值。 split() returns the number of elements created. split()返回创建的元素数。

Loop till as long as i in A is true, I just came to know that i=1 and i++ control the order of traversal of the array, Thanks to Ed Morton 循环直到i in A为真,我才知道i=1i++控制数组的遍历顺序,这要感谢Ed Morton

  • if(i==2) { $1 = c1 $2 = c2 }

when i = 1 we print a,b,c,d and d,e,f,g,h , in next iteration we modify $1 and $2 value with c1 and c2 we created above since you are interested to show only once as requested. i = 1我们打印a,b,c,dd,e,f,g,h ,在下一次迭代中,我们用上面创建的c1c2修改$1$2值,因为您有兴趣只显示一次。

  • printf("%s%s%s%s%d\\n",$1,D[2],$2,D[3],A[i])

Finally print field1 ( $1 ), separator between field1 and field2 to we saved above, that is D[2] , field2 ( $2 ), separator between field2 and field3 and array A element only by one which we created from ( split($3,A,/,/) ). 最后打印field1( $1 ),我们上面保存的field1和field2之间的分隔符,即D[2] ,field2( $2 ),field2和field3之间的分隔符,以及数组A元素仅由我们从( split($3,A,/,/) )。

$ cat tst.awk
NR<3 || !NF { print; next }
{
    front = gensub(/((\S+\s+){2}).*/,"\\1","")
    split($3,a,/,/)
    for (i=1;i in a;i++) {
        print front a[i]
        gsub(/\S/," ",front)
    }
}

$ awk -f tst.awk file
1st field     2nd field    3rd field
---------     ---------    -----------
a,b,c,d       d,e,f,g,h    1
                           2
                           3
                           4
                           5
                           5

q,w,e,r       t,y,g,t,i    9
                           8
                           7
                           6
                           5
                           5

The above uses GNU awk for gensub(), with other awks use match()+substr(). 上面的代码对gensub()使用GNU awk,而其他awks使用match()+ substr()。 It also uses \\S and \\s shorthand for [^[:space:]] and [[:space:]] . 它还为[^[:space:]][[:space:]]使用\\S\\s简写。

Considering the columns are tab separated, I would say: 考虑到列是制表符分隔的,我会说:

awk 'BEGIN{FS=OFS="\t"}
     NR<=2 || !NF {print; next}
     NR>2{n=split($3,a,",")
          for (i=1;i<=n; i++)
              print (i==1?$1 OFS $2:"" OFS ""), a[i]
         }' file
  • This prints the 1st, 2nd and empty lines normally 这将正常打印第一行,第二行和空行
  • Then, slices the 3rd field using the comma as separator. 然后,使用逗号作为分隔符对第三个字段进行切片。
  • Finally, loops through the amount of pieces printing each one every time; 最后,循环遍历每次打印的数量; it prints the first two columns the first time, then just the last value. 它将第一次打印前两列,然后仅打印最后一个值。

Test 测试

$ awk 'BEGIN{FS=OFS="\t"} NR<=2 || !NF {print; next} NR>2{n=split($3,a,","); for (i=1;i<=n; i++) print (i==1?$1 OFS $2:"" OFS ""), a[i]}' a
1st field   2nd field   3rd field
---------   ---------   -----------
a,b,c,d d,e,f,g,h   1
        2
        3
        4
        5
        5

q,w,e,r t,y,g,t,i   9
        8
        7
        6
        5
        5

Note the output is a bit ugly, since tab separating the columns lead them like this. 请注意,输出有点难看,因为用制表符分隔各列会像这样引导它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM