简体   繁体   中英

How to rename multiple columns in multiple files?

I have multiple files which look like this:

trans_ENSG00000047849.txt.traw
trans_ENSG00000047848.txt.traw
trans_ENSG00000047847.txt.traw
...

In them I have around 300 columns, and column names look like this:

NA20826_NA20826 NA20828_NA20828 NA20819_NA20819

I would like that my column names in all files have instead this form:

NA20826 NA20828 NA20819

In other words I would like to remove everything after underscore _ in every column name and in every file.

I should mention that there is a here is a tab space at the beginning of each file.

I tried this:

sed -ri 's/[_].*$//' trans_*.txt.traw

but when I tried to open one of these transformed files in RI got this error:

> e=read.table("trans_ENSG00000135541.txt.traw", header=TRUE)
Error in read.table("trans_ENSG00000135541.txt.traw", header = TRUE) : 
  more columns than column names

I guess you actually want this:

$ echo -e "\tNA20826_NA20826\tNA20828_NA20828\tNA20819_NA20819" | sed -r '1s/_[^\t]*//g'
        NA20826 NA20828 NA20819

_[^\\t]* since it's TAB separated, so starting from _ to before the TAB (or end of line) are things to be deleted.
g flag is to replace all occurances in line.
The first 1 is to limit the replace in first line -- The title line.

Your own s ubstitude command 's/[_].*$//' , is to replace from the first _ to the end of the line, so it will ends up with only one title left.

Sed command you need is:

sed -ri 's/_\S*//g'

This regexp removes part of every word, starting from underline until next space or tab character, no matter how many columns has each line.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM