I have data that looks like this
SNP NA18524 NA18526 NA18529 NA18537
Status Low Low High High
Pop ASN ASN CEU YRI
ENSG00000187634 6.425880 6.348570 6.464480 6.391740
And I want to match and print only those columns where I find the string ASN
I will later want to match and print only CEU and then only YRI
Do I want something like
sed 'p/[ASN]//g'
Output would therefore look like:
SNP NA18524 NA18526
Status Low Low
Pop ASN ASN
ENSG00000187634 6.425880 6.348570
?
Data is tab delimited.
Doing this in Python would probably be the least mind-bending of the several languages you mentioned in the title. It should be straightforward: just read the lines ( for line in open('myfile.tsv'):
), tokenize ( fields = line.split('\\t')
), match on your search string, keep track of which columns you "like," then do the whole thing a second time, printing the fields you now know you need.
If you get stuck with the implementation, you might want to post that as a separate, more specific question.
ZERO ELEGANCE ... but it should work
awk -F'\t' ' {
if (found!=1) {
for(i=0;i<=NF;++i) {
if ($i=="ASN") {
# save which columns contains it
idx[i] = 1;
found=1;
}
}
# at least one column found?
if (found==1) {
# this 3 instructions will rewind the file
ARGC++;
ARGV[ARGIND+1] = FILENAME;
nextfile;
}
}
else {
# printout the rows matching
for (a in idx)
printf("%s ",$a);
printf("\n");
}
} ' yourfile
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.