简体   繁体   English

使用awk或sed将行转换为列

[英]Converting Rows into Columns using awk or sed

I have a file with *.xvg format. 我有一个* .xvg格式的文件。
It contains six columns with 500 numbers each. 它包含六列,每列500个数字。
Except the time column (first column) all other columns contain floats. 除时间列(第一列)外,其他所有列均包含浮点数。

I want to generate an output file in same format, in which these columns are converted into rows with each number separated by space. 我想生成一个相同格式的输出文件,其中将这些列转换为行,每个数字之间用空格分隔。

I have written a program in C, which works fine for me but I am looking for an alternative way using awk or sed, which will allow me to do the same. 我已经用C编写了一个程序,该程序对我来说很好用,但是我正在寻找使用awk或sed的另一种方法,这将允许我执行相同的操作。

I am absolutely new to these scripting languages. 我绝对不熟悉这些脚本语言。 I couldn't find any relevant answer for me in previously asked questions. 在先前提出的问题中,我找不到任何相关的答案。 So, If somebody can help me out with this task I will be grateful. 因此,如果有人可以帮助我完成这项任务,我将不胜感激。

Input file looks like this :- 输入文件如下所示:

  # This file was created Thu Oct  1 17:18:10 2015
  # by the following command:
  # /home/durba/gmx455/bin/mdrun -np 1 -deffnm md0 -v 
  #
  @    title "dH/d\xl\f{}, \xD\f{}H"
  @    xaxis  label "Time (ps)"
  @    yaxis  label "(kJ/mol)"
  @TYPE xy
  @ subtitle "T = 200 (K), \xl\f{} = 0"
  @ view 0.15, 0.15, 0.75, 0.85
  @ legend on
  @ legend box on
  @ legend loctype view
  @ legend 0.78, 0.8
  @ legend length 2
  @ s0 legend "dH/d\xl\f{} \xl\f{} 0"
  @ s1 legend "\xD\f{}H \xl\f{} 0.05"
  0  19.3191 1.16531   1.8   -447.07  -47.07
  2 -447.072 -17.6454  1.5   -17.633  -1.33
  4 -17.633 -0.446508  1.3   -75.455  -5.45
  6 -75.4555 -2.83981  1.4   -28.724  -28.4
  8 -28.7246 -0.884639 1.5   -41.877  -14.87
  10 -41.8779 -1.45569  2.8   -43.685  -3.685
  12 -43.6851 -1.4797   -3.1  -91.651  -91.651
  14 -91.6515 -3.52492  -3.5  -61.135  -1.135
  16 -61.1356 -2.30129  -3.2  -48.847  -48.47

output file should look like this :- 输出文件应如下所示:-

  # This file was created Thu Oct  1 17:18:10 2015
  # by the following command:
  # /home/durba/gmx455/bin/mdrun -np 1 -deffnm md0 -v 
  #
  @    title "dH/d\xl\f{}, \xD\f{}H"
  @    xaxis  label "Time (ps)"
  @    yaxis  label "(kJ/mol)"
  @TYPE xy
  @ subtitle "T = 200 (K), \xl\f{} = 0"
  @ view 0.15, 0.15, 0.75, 0.85
  @ legend on
  @ legend box on
  @ legend loctype view
  @ legend 0.78, 0.8
  @ legend length 2
  @ s0 legend "dH/d\xl\f{} \xl\f{} 0"
  @ s1 legend "\xD\f{}H \xl\f{} 0.05"
  0  2  4 6 8 10 12 
  19.3191 -447.072 -17.633 -17.633 -75.4555 -28.7246 -41.8779 -43.6851 -91.6515 -61.1356
  1.16531 -17.6454 -0.446508 -2.83981 -0.884639 -1.45569 -1.4797 -3.52492 -2.30129
  1.8 1.5 1.3 1.4 1.5 2.8 -3.1 -3.5 -3.2
  -447.07 -17.633 -75.455 -28.724 -41.877 -43.685 -91.651 -61.135 -48.847
  -47.07 -1.33 -5.45 -28.4 -14.87 -3.685 -91.651 -1.135 -48.47

Please note that lines starting with "#" and "@" should be the same in both files. 请注意,两个文件中以“#”和“ @”开头的行应相同。

Answer for original question 回答原始问题

Let's consider this test file: 让我们考虑一下这个测试文件:

$ cat file
123 1.2 1.3 1.4 1.5
124 2.2 2.3 2.4 2.5
125 3.2 3.3 3.4 3.5

To convert columns to row: 要将列转换为行:

$ awk '{for (i=1;i<=NF;i++)a[i,NR]=$i} END{for (i=1;i<=NF;i++) for (j=1;j<=NR;j++) printf "%s%s",a[i,j],(j==NR?ORS:OFS)}' file
123 124 125
1.2 2.2 3.2
1.3 2.3 3.3
1.4 2.4 3.4
1.5 2.5 3.5

How it works 这个怎么运作

  • for (i=1;i<=NF;i++)a[i,NR]=$i

    As we loop through each line, we save the values in array a . 遍历每一行时,将值保存在数组a

  • END{for (i=1;i<=NF;i++) for (j=1;j<=NR;j++) printf "%s%s",a[i,j],(j==NR?ORS:OFS)}

    After we reach the end of the file, we print each of the values followed by the output field separator ( OFS ) if we are in the midst of a line or the output record separator ( ORS ) if we are at the end of the line. 到达文件末尾后,如果我们在一行的中间,则打印每个值,然后输出输出字段分隔符( OFS ),如果在行的末尾,则输出输出记录分隔符( ORS ) 。

Multi-line version 多行版本

If you like your code spread over several lines: 如果您喜欢将代码分布在多行中:

awk '
{
  for (i=1;i<=NF;i++)
    a[i,NR]=$i
}

END{
  for (i=1;i<=NF;i++)
    for (j=1;j<=NR;j++)
      printf "%s%s",a[i,j],(j==NR?ORS:OFS)
}
' file

Answer for revised question 修订问题的答案

In the revised question, there are lines at the beginning of the file that start with @ or # that are not to be changed. 在修改后的问题中,文件开头有以@#开头的行是不可更改的。 In this case: 在这种情况下:

$ awk '/^[@#]/{print;next}{k++; for (i=1;i<=NF;i++)a[i,k]=$i;} END{for (i=1;i<=NF;i++) for (j=1;j<=k;j++) printf "%s%s",a[i,j],(j==k?ORS:OFS)}' input
# This file was created Thu Oct  1 17:18:10 2015
# by the following command:
# /home/durba/gmx455/bin/mdrun -np 1 -deffnm md0 -v 
#
#
#
@    title "dH/d\xl\f{}, \xD\f{}H"
@    xaxis  label "Time (ps)"
@    yaxis  label "(kJ/mol)"
@TYPE xy
@ subtitle "T = 200 (K), \xl\f{} = 0"
@ view 0.15, 0.15, 0.75, 0.85
@ legend on
@ legend box on
@ legend loctype view
@ legend 0.78, 0.8
@ legend length 2
@ s0 legend "dH/d\xl\f{} \xl\f{} 0"
@ s1 legend "\xD\f{}H \xl\f{} 0.05"
0 2 4 6 8 10 12 14 16
19.3191 -447.072 -17.633 -75.4555 -28.7246 -41.8779 -43.6851 -91.6515 -61.1356
1.16531 -17.6454 -0.446508 -2.83981 -0.884639 -1.45569 -1.4797 -3.52492 -2.30129
1.8 1.5 1.3 1.4 1.5 2.8 -3.1 -3.5 -3.2
-447.07 -17.633 -75.455 -28.724 -41.877 -43.685 -91.651 -61.135 -48.847
-47.07 -1.33 -5.45 -28.4 -14.87 -3.685 -91.651 -1.135 -48.47

This might work for you (GNU sed): 这可能对您有用(GNU sed):

sed -r 'H;$!d;x;:a;h;s/\n(\S+)[^\n]*/\1 /g;s/ $//p;g;s/\n\S+ ?/\n/g;ta;d' file

Slurp the file into hold space (HS) deleting the pattern space (PS) until the end-of-file condition is met. 将文件拖入保留空间(HS),删除模式空间(PS),直到满足文件结束条件。 At end-of-file swap the HS for the PS. 文件结束时,将HS换成PS。 Copy the PS to the HS and then remove all but the first field following a newline with the first field followed by a space, globally. 将PS复制到HS,然后在全局行中删除除了第一个字段之外的所有内容,其中第一个字段后跟一个空格。 Remove the last space and print the line. 删除最后一个空格并打印行。 Then recall the copy of the line from the HS and do the inverse. 然后从HS调出该行的副本并进行相反的操作。 If any of the substitutions were successful repeat the process until nothing but newlines exist. 如果任何替换成功,请重复该过程,直到只有换行符为止。 Delete the unwanted newlines. 删除不需要的换行符。

Since first answering the original question changed. 自从第一次回答以来,原始问题就发生了变化。 The new solution below caters for the new question using essentially the same method: 下面的新解决方案使用基本相同的方法满足新问题:

sed -r '/^[0-9]/{s/ +/ /g;H};//!p;$!d;x;:a;h;s/\n(\S+)[^\n]*/\1 /g;s/ $//p;g;s/\n\S+ ?/\n/g;ta;d' file

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM