[英]How to rename multiple columns in multiple files?
I have multiple files which look like this:我有多个文件,看起来像这样:
trans_ENSG00000047849.txt.traw
trans_ENSG00000047848.txt.traw
trans_ENSG00000047847.txt.traw
...
In them I have around 300 columns, and column names look like this:其中我有大约 300 列,列名如下所示:
NA20826_NA20826 NA20828_NA20828 NA20819_NA20819
I would like that my column names in all files have instead this form:我希望所有文件中的列名都改为这种形式:
NA20826 NA20828 NA20819
In other words I would like to remove everything after underscore _ in every column name and in every file.换句话说,我想删除每个列名和每个文件中下划线 _ 之后的所有内容。
I should mention that there is a here is a tab space at the beginning of each file.我应该提到的是,每个文件的开头都有一个制表符空间。
I tried this:我试过这个:
sed -ri 's/[_].*$//' trans_*.txt.traw
but when I tried to open one of these transformed files in RI got this error:但是当我尝试在 RI 中打开这些转换后的文件之一时出现此错误:
> e=read.table("trans_ENSG00000135541.txt.traw", header=TRUE)
Error in read.table("trans_ENSG00000135541.txt.traw", header = TRUE) :
more columns than column names
I guess you actually want this:我猜你真的想要这个:
$ echo -e "\tNA20826_NA20826\tNA20828_NA20828\tNA20819_NA20819" | sed -r '1s/_[^\t]*//g'
NA20826 NA20828 NA20819
_[^\\t]*
since it's TAB
separated, so starting from _
to before the TAB
(or end of line) are things to be deleted. _[^\\t]*
因为它是TAB
分隔的,所以从_
开始到TAB
之前(或行尾)是要删除的东西。
g
flag is to replace all occurances in line. g
标志是替换所有出现的行。
The first 1
is to limit the replace in first line -- The title line.第
1
是限制在第一线更换-标题行。
Your own s ubstitude command 's/[_].*$//'
, is to replace from the first _
to the end of the line, so it will ends up with only one title left.你自己的ubstitude 命令
's/[_].*$//'
,是从第一个_
到行尾替换,所以它最终只剩下一个标题。
Sed command you need is:您需要的 sed 命令是:
sed -ri 's/_\S*//g'
This regexp removes part of every word, starting from underline until next space or tab character, no matter how many columns has each line.无论每行有多少列,此正则表达式都会删除每个单词的一部分,从下划线开始直到下一个空格或制表符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.