I have the file as ftp.txt which contains many versions of lines such as
>KCY60942 pep:novel supercontig:GCA_000682575.1:ab248605.contig.36_1:19:588:-1 gene:J738_3590 transcript:KCY60942 description:"putative transposase 1"
MTHLNELYLILNKYLKWNKSHLKCFALIMLVIILKQTCNLSSASKALPIKCLPQSFYRRM
QRFFAGQYFDYRQISQLIFNMFSFDQVQLTLDRTNWKWGKRNINILMLAIVYRGIAIPIL
WTLLNKRGNSDTKERIALIQRFIAIFGKDRIVNVFADREFIGEQWFTWLIEQDINFCIRV
KKTSLSPII
>KCY61710 pep:novel supercontig:GCA_000682575.1:ab248605.contig.22_1:4164:6320:1 gene:J738_2986 transcript:KCY61710 description:"tonB-dependent siderophore receptor family protein"
MQRTTKHFQINALALAIAMSTISAHAETDQQTSEYGTLPTIKVKAGSGQENEKSYIAGKT
DTAVPLGLSVREVPQSVSVITQQRLQDQQLSTLVEVAENVTGVSVNRYETNRGGIYSRGF
VVDNYIIDGIPTTYSLPWSSGEIFSSMALYDHIDVVRGATGLTFGAGNPSAAINMVRKRA
TSTEPTANVEVSAGSWDNYRVMGDIANSLNQSGTVRGRAVAQYEQGDSYTDLLSKEKLSL
LLSAEADLSENTLLSGGVTYQEDDPRGPMWGGLPVWFSDGTKTNWSKNITTSADWTRWNV
KYTNLFADLTHKFNDNWSAKLSYSHGKRDANSKLLYVSGSVDKNTGLGLSPYASAYDLEV
EQDNASLQLNGSFDLWGLEQKVVLGYQYSNQDFTAYARSTDTKMEIGNFFEWNGSMPEPV
WNAPTLNEKYNIEQNALFAATYLNPIEPLKFILGGRFTNYEKNIYGRSSSIKYDHEFVPY
AGIIYDFNDVYTAYASYTSIFQPQDKKDFDGNYLDPVEGNSTEVGLKSAWFDGRLNGTLA
LYHIKQDNLAQEAGDVTRNGVKEIYYRAAKGATSEGFEVEVSGQITPDWNITAGYSQFSA
KDTNDVDVNTQLPRKMIQTFTTYKLSGKLENITVGGGVNWQSSTYINAENPKEVIEKVEQ
GDYALVNLMARYQITKDFSAQLNINNVFDKKYYGVFPAYGQITLGAPRNAALTLQYKF
my query is to separate each version and want to save it each version with different file names? i tried the below code but i get only the line which startsup
#!/usr/local/bin/perl
open( FILE, "/home/httpd/cgi-bin/r/ftp.txt" );
while ( $line = <FILE> ) {
if ( $line =~ m/^\>/g ) {
print $line;
}
}
my desired output should be those two different versions which starts as like this >KCY60942 and >KCY61710 must saved in different filenames such as >KCY60942 should be saved in one file name and >KCY61710 it should be saved in another file name.
Here's another option:
use strict;
use warnings;
local $/ = '';
while (<>) {
my ($fileName) = /^>([^\s]+)/;
open my $fh, '>', "$fileName.txt" or die "Can't write to '$fileName.txt'";
print $fh $_;
close $fh;
}
Usage: perl script.pl inFile
Since each (FASTA?) record is a paragraph, $/
is set to empty ( ''
) to read the file in paragraph mode--one 'record' at a time. Each record's id is captured for use as that record's file name, and then that record is written to its file.
Hope this helps!
Something like this should do the trick:
#!/usr/local/bin/perl
use strict;
use warnings;
open( my $file, "<", "/home/httpd/cgi-bin/r/ftp.txt" );
open( my $output, ">", "pre-match" ) or die $!;
while ( my $line = <$file> ) {
if ( $line =~ m/^\>/g ) {
my ($output_name) = ( $line =~ m/^\>(\w+)/ );
close($output);
open( $output, ">", $output_name . ".output" ) or die $!;
}
print {$output} $line;
}
close($output);
If your line matches that regular expression, we 'pick out' the first word (so KCY61710
etc.) and open a file called KCY61710.output
.
We print each line as we go to this output, closing and re-opening each time we hit one of those lines.
A pre-match
file exists in case the first line(s) don't match this pattern.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.