This is an example of what the data I have looks like:
ID pos gene
1 SAMPLE1 1234 BRCA
2 SAMPLE2 2910 EGFR
3 SAMPLE3 1271 MYC
This is the desired output
ID pos gene
SAMPLE1 1234 BRCA
SAMPLE2 2910 EGFR
SAMPLE3 1271 MYC
I tried cut -f2- mydata.txt
but that removes the entire column and I would still like to keep ID as the column name.
$ sed -E '2,$s/[^ ]+ +//' file
Using awk and sub
to preserve the spaces. If the file is tab separated, forget this one:
$ awk '
FNR==1 { # first record
nf=NF # store field count to nf
}
NF>nf { # if NF > nf
for(i=1;i<=NF-nf;i++) # using sub remove NF-nf first fields
sub(/^[^ ]+ +/,"")
}1' file # output
Output:
ID pos gene
SAMPLE1 1234 BRCA
SAMPLE2 2910 EGFR
SAMPLE3 1271 MYC
How can I remove the first column in every row besides the columns in shell?
You may use this awk
:
awk 'NR > 1 {sub(/^[ \t]*[^ \t]+[ \t]+/, "")} 1' file
ID pos gene
SAMPLE1 1234 BRCA
SAMPLE2 2910 EGFR
SAMPLE3 1271 MYC
head -n1 mydata.txt; tail -n +2 mydata.txt | cut -d' ' -f2-
A trick for skipping the first line in something that doesn't otherwise have a way to handle a header line is to use a command group where everything in it shares the same standard input and output streams. You read the first line and echo it, and then the real program works on the rest of the input:
{ IFS= read -r line && echo "$line" && cut -f2-; } < mydata.txt
$ awk 'NR>1{sub(/[^[:space:]]+[[:space:]]+/,"")}1' file
ID pos gene
SAMPLE1 1234 BRCA
SAMPLE2 2910 EGFR
SAMPLE3 1271 MYC
If your file is tab-separated AND might have empty 2nd fields or blanks in the 1st field then just change [^[:space:]]+[[:space:]]+
to ^[^\t]*[\t]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.