简体   繁体   中英

Extract all columns which match a string awk sed python perl

I have data that looks like this

SNP     NA18524 NA18526 NA18529 NA18537 
Status  Low     Low  High    High    
Pop     ASN     ASN     CEU       YRI 
ENSG00000187634 6.425880        6.348570        6.464480        6.391740 

And I want to match and print only those columns where I find the string ASN

I will later want to match and print only CEU and then only YRI

Do I want something like

 sed 'p/[ASN]//g'

Output would therefore look like:

SNP     NA18524 NA18526 
Status  Low     Low      
Pop     ASN     ASN     
ENSG00000187634 6.425880        6.348570

?

Data is tab delimited.

Doing this in Python would probably be the least mind-bending of the several languages you mentioned in the title. It should be straightforward: just read the lines ( for line in open('myfile.tsv'): ), tokenize ( fields = line.split('\\t') ), match on your search string, keep track of which columns you "like," then do the whole thing a second time, printing the fields you now know you need.

If you get stuck with the implementation, you might want to post that as a separate, more specific question.

ZERO ELEGANCE ... but it should work

awk -F'\t' ' { 
   if (found!=1) { 
       for(i=0;i<=NF;++i) { 
           if ($i=="ASN") { 
               # save which columns contains it
               idx[i] = 1; 
               found=1; 
           } 
       } 

       # at least one column found?
       if (found==1) { 
           # this 3 instructions will rewind the file
           ARGC++; 
           ARGV[ARGIND+1] = FILENAME; 
           nextfile; 
       } 
   } 
   else { 
       # printout the rows matching
       for (a in idx) 
           printf("%s ",$a); 
       printf("\n"); 
   } 
} ' yourfile

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM