简体   繁体   English

在Linux命令行中更改文本文件

[英]change in the text file in linux command line

I have a big file like this example: 我有一个像这个例子的大文件:

#name   chrom   strand  txStart txEnd   cdsStart    cdsEnd  exonCount   exonStarts  exonEnds    proteinID   alignID
uc001aaa.3  chr1    +   11873   14409   11873   11873   3   11873,12612,13220,  12227,12721,14409,      uc001aaa.3
uc010nxr.1  chr1    +   11873   14409   11873   11873   3   11873,12645,13220,  12227,12697,14409,      uc010nxr.1
uc010nxq.1  chr1    +   11873   14409   12189   13639   3   11873,12594,13402,  12227,12721,14409,  B7ZGX9  uc010nxq.1
uc009vis.3  chr1    -   14361   16765   14361   14361   4   14361,14969,15795,16606,    14829,15038,15942,16765,        uc009vis.3

I want to change the 4th column. 我想更改第四栏。 each element in each row in column 4 should be replaced by the element in the same row but from column 5. I want to change this element from column5 and put it in the same row but in column 4. the change would be "(element of column5) - 1". 第4列中每行中的每个元素都应替换为同一行中第5列中的元素。我想从第5列中更改此元素并将其放在同一行中,但在第4列中,则更改为“(element列5)-1“。 I am not so familiar with command line in linux(shell). 我对linux(shell)中的命令行不太熟悉。 do you know how I can do that in a single line? 你知道我怎么能做到这一点吗? here is the expected output: 这是预期的输出:

#name   chrom   strand  txStart txEnd   cdsStart    cdsEnd  exonCount   exonStarts  exonEnds    proteinID   alignID
uc001aaa.3  chr1    +   14408   14409   11873   11873   3   11873,12612,13220,  12227,12721,14409,      uc001aaa.3
uc010nxr.1  chr1    +   14408   14409   11873   11873   3   11873,12645,13220,  12227,12697,14409,      uc010nxr.1
uc010nxq.1  chr1    +   14408   14409   12189   13639   3   11873,12594,13402,  12227,12721,14409,  B7ZGX9  uc010nxq.1
uc009vis.3  chr1    -   16764   16765   14361   14361   4   14361,14969,15795,16606,    14829,15038,15942,16765,        uc009vis.3

awk is a great tool for manipulating files like this. awk是处理此类文件的好工具。 It allows processing a file that consists of records of fields; 它允许处理由字段记录组成的文件。 by default records are defined by lines in the file and fields are separated by spaces. 默认情况下,记录由文件中的行定义,而字段由空格分隔。 The awk command line to do what you want is: awk命令行可做的是:

awk '!/^#/ { $4 = $5 - 1 } { print }' <filename>

An awk program is a sequence of pattern-action pairs. awk程序是一系列模式-动作对。 If a pattern is omitted the action is performed for all input records, if an action is omitted (not used in this program) the default action is to print the record. 如果省略模式,则对所有输入记录执行操作,如果省略操作(此程序中未使用),则默认操作是打印记录。 Fields are referenced in an awk program as $n where n is the field number. 在awk程序中将字段引用为$n ,其中n是字段编号。 There are several forms of pattern but the one used here is the negation a regular expression that is matched against the whole record. 模式有几种形式,但此处使用的一种形式是否定与整个记录匹配的正则表达式。 So this program updates the 4th field to be the value of the 5th field minus 1 but only for lines that do not start with a # to avoid messing up the header. 因此,此程序将第4个字段更新为第5个字段的值减1,但仅针对不以#开头的行,以免弄乱标题。 Then for all records (because the pattern is omitted) the record is printed. 然后,对于所有记录(因为省略了模式),将打印该记录。 The pattern-action pairs are evaluated in order so the records is printed after updating the 4th field. 模式-动作对将按顺序进行评估,以便在更新第四个字段后打印记录。

将您的内容保存为awk'{if(NR> 1){$ 4 = $ 5-1; print $ 0} else {print $ 0}}'a

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM