简体   繁体   English

在Awk中处理数据

[英]Manipulate data in Awk

I am new to Awk programming.I have a question on manipulating text file,which is required to draw certain Network based images in a visualization software(Circos http://circos.ca ) 我是Awk编程的新手。我有一个关于处理文本文件的问题,需要在可视化软件中绘制某些基于网络的图像(Circos http://circos.ca

I have input data for which I want to manipulate values using awk/grep/sed. 我有要使用awk / grep / sed处理值的输入数据。 There are 9 pairs(18 lines).5 pairs(first 10 lines) are for "from=ABCB11", and 4 pairs(next 8 lines) are for "from =ABCC8". 有9对(18行)。5对(前10行)用于“ from = ABCB11”,而4对(后8行)用于“ from = ABCC8”。 What I want is extract the value from the first line of the first pair and replace it in each alternate line of the rest of the other pairs. 我想要的是从第一对的第一行中提取值,并将其替换为其他对的其余每行中的值。 So value for group-2 is 9 10 ,which should replace all the occurence of value in group2. 因此,第2组的值为9 10,它应替换第2组中所有出现的值。 The next value for group-2 is 28 29,which should be replaced by 9 10. 组2的下一个值为28 29,应替换为9 10。

The stop should be determined by "from=name" which is "from=ABCB11".Its not necessary that the rows that have to captured expression from and replace in its next occurence will belong to group-2 as in this instance.It could be group-3 or group-4 until group-10.So second set ("from =ABCC8")could have been belonged to group-4/5/6 not necessary group-2.Its just a coincidence here. 停止应由“ from = name”(即“ from = ABCB11”)确定。在这种情况下,不必从表达式中捕获表达式并在下一次出现时替换的行将属于group-2。是第3组或第4组,直到第10组。因此第二组(“来自= ABCC8”)可以属于第4/5/6组,而不必属于第2组。这只是一个巧合。

group-2 9 10 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=ACE,toid=11,use=1,z=1
group-3 0 1 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=ACE,toid=11,use=1,z=1
group-2 28 29 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=CHRM1,toid=114,use=1,z=1
group-5 0 1 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=CHRM1,toid=114,use=1,z=1
group-2 29 30 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=CHRM2,toid=115,use=1,z=1
group-5 1 2 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=CHRM2,toid=115,use=1,z=1
group-2 10 11 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=DRD2,toid=158,use=1,z=1
group-3 1 2 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=DRD2,toid=158,use=1,z=1
group-2 11 12 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=EGF,toid=164,use=1,z=1
group-3 2 3 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=EGF,toid=164,use=1,z=1
group-2 21 22 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ACE,toid=11,use=1,z=1
group-3 12 13 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ACE,toid=11,use=1,z=1
group-2 0 1 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1A,toid=21,use=1,z=1
group-1 0 1 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1A,toid=21,use=1,z=1
group-2 1 2 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1B,toid=22,use=1,z=1
group-1 1 2 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1B,toid=22,use=1,z=1
group-2 2 3 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1D,toid=23,use=1,z=1
group-1 2 3 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1D,toid=23,use=1,z=1

Below is the FINAL output,I am looking for: 以下是最终输出,我正在寻找:

group-2 9 10 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=ACE,toid=11,use=1,z=1
group-3 0 1 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=ACE,toid=11,use=1,z=1
group-2 9 10 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=CHRM1,toid=114,use=1,z=1
group-5 0 1 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=CHRM1,toid=114,use=1,z=1
group-2 9 10 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=CHRM2,toid=115,use=1,z=1
group-5 1 2 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=CHRM2,toid=115,use=1,z=1
group-2 9 10 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=DRD2,toid=158,use=1,z=1
group-3 1 2 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=DRD2,toid=158,use=1,z=1
group-2 9 10 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=EGF,toid=164,use=1,z=1
group-3 2 3 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=EGF,toid=164,use=1,z=1
group-2 21 22 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ACE,toid=11,use=1,z=1
group-3 12 13 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ACE,toid=11,use=1,z=1
group-2 21 22 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1A,toid=21,use=1,z=1
group-1 0 1 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1A,toid=21,use=1,z=1
group-2 21 22 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1B,toid=22,use=1,z=1
group-1 1 2 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1B,toid=22,use=1,z=1
group-2 21 22 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1D,toid=23,use=1,z=1
group-1 2 3 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1D,toid=23,use=1,z=1

Also,this is just a sample data.So many pairs would have group-1,group-4,group-5 upto group 10.Here,only pairs from lesser groups are mentioned. 同样,这仅是示例数据。如此多的对将具有第1组,第4组,第5组直到第10组。在这里,仅提及较少组中的对。

I want to loop through the lines until the value in "from=name" remains same,so that I can change all occurences in each alternate line.Code: 我想遍历各行,直到“ from = name”中的值保持不变,以便我可以更改每行中的所有出现。代码:

awk -F, 'NR%2==1 {split($2,a,"="); print a[2]}' file.txt

The above code is able to extract the alternate lines and the "name" in "from=name" 上面的代码能够提取替换行和“ from = name”中的“ name”

The following is quite verbose (I love verbose variable names). 以下内容非常详细(我喜欢详细的变量名)。 Using your sample-data, I get the data you want to have. 使用您的样本数据,我可以获得您想要的数据。 This assumes, that every "uneven" line gets the values from the first line with the same "from=xxxx" information. 假定每条“不均匀”行都从第一行获得具有相同“ from = xxxx”信息的值。

awk '
  BEGIN {
    namevar=""
    val1var=""
    val2var=""
    linenum=0
  }
  {
    split($0, linearr)
    split(linearr[5], csvarr, ",")
    if (namevar != csvarr[2]) {
      namevar=csvarr[2]
      val1var=linearr[2]
      val2var=linearr[3]
      linenum=0
    }
    linenum+=1
    if (linenum%2==1) {
      print linearr[1], val1var, val2var, linearr[4], linearr[5]
    } else {
      print linearr[1], linearr[2], linearr[3], linearr[4], linearr[5]
    }
  }' file.txt

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM