简体   繁体   中英

How to split one file into multiple files using perl?

I have the file as ftp.txt which contains many versions of lines such as

>KCY60942 pep:novel supercontig:GCA_000682575.1:ab248605.contig.36_1:19:588:-1 gene:J738_3590 transcript:KCY60942 description:"putative transposase 1"
MTHLNELYLILNKYLKWNKSHLKCFALIMLVIILKQTCNLSSASKALPIKCLPQSFYRRM
QRFFAGQYFDYRQISQLIFNMFSFDQVQLTLDRTNWKWGKRNINILMLAIVYRGIAIPIL
WTLLNKRGNSDTKERIALIQRFIAIFGKDRIVNVFADREFIGEQWFTWLIEQDINFCIRV
KKTSLSPII

>KCY61710 pep:novel supercontig:GCA_000682575.1:ab248605.contig.22_1:4164:6320:1 gene:J738_2986 transcript:KCY61710 description:"tonB-dependent siderophore receptor family protein"
MQRTTKHFQINALALAIAMSTISAHAETDQQTSEYGTLPTIKVKAGSGQENEKSYIAGKT
DTAVPLGLSVREVPQSVSVITQQRLQDQQLSTLVEVAENVTGVSVNRYETNRGGIYSRGF
VVDNYIIDGIPTTYSLPWSSGEIFSSMALYDHIDVVRGATGLTFGAGNPSAAINMVRKRA
TSTEPTANVEVSAGSWDNYRVMGDIANSLNQSGTVRGRAVAQYEQGDSYTDLLSKEKLSL
LLSAEADLSENTLLSGGVTYQEDDPRGPMWGGLPVWFSDGTKTNWSKNITTSADWTRWNV
KYTNLFADLTHKFNDNWSAKLSYSHGKRDANSKLLYVSGSVDKNTGLGLSPYASAYDLEV
EQDNASLQLNGSFDLWGLEQKVVLGYQYSNQDFTAYARSTDTKMEIGNFFEWNGSMPEPV
WNAPTLNEKYNIEQNALFAATYLNPIEPLKFILGGRFTNYEKNIYGRSSSIKYDHEFVPY
AGIIYDFNDVYTAYASYTSIFQPQDKKDFDGNYLDPVEGNSTEVGLKSAWFDGRLNGTLA
LYHIKQDNLAQEAGDVTRNGVKEIYYRAAKGATSEGFEVEVSGQITPDWNITAGYSQFSA
KDTNDVDVNTQLPRKMIQTFTTYKLSGKLENITVGGGVNWQSSTYINAENPKEVIEKVEQ
GDYALVNLMARYQITKDFSAQLNINNVFDKKYYGVFPAYGQITLGAPRNAALTLQYKF

my query is to separate each version and want to save it each version with different file names? i tried the below code but i get only the line which startsup

#!/usr/local/bin/perl
open( FILE, "/home/httpd/cgi-bin/r/ftp.txt" );
while ( $line = <FILE> ) {
    if ( $line =~ m/^\>/g ) {
        print $line;
    }
}

my desired output should be those two different versions which starts as like this >KCY60942 and >KCY61710 must saved in different filenames such as >KCY60942 should be saved in one file name and >KCY61710 it should be saved in another file name.

Here's another option:

use strict;
use warnings;

local $/ = '';

while (<>) {
    my ($fileName) = /^>([^\s]+)/;
    open my $fh, '>', "$fileName.txt" or die "Can't write to '$fileName.txt'";
    print $fh $_;
    close $fh;
}

Usage: perl script.pl inFile

Since each (FASTA?) record is a paragraph, $/ is set to empty ( '' ) to read the file in paragraph mode--one 'record' at a time. Each record's id is captured for use as that record's file name, and then that record is written to its file.

Hope this helps!

Something like this should do the trick:

#!/usr/local/bin/perl

use strict;
use warnings;

open( my $file, "<", "/home/httpd/cgi-bin/r/ftp.txt" );
open( my $output, ">", "pre-match" ) or die $!;

while ( my $line = <$file> ) {
    if ( $line =~ m/^\>/g ) {
        my ($output_name) = ( $line =~ m/^\>(\w+)/ );
        close($output);
        open( $output, ">", $output_name . ".output" ) or die $!;
    }
    print {$output} $line;
}

close($output);

If your line matches that regular expression, we 'pick out' the first word (so KCY61710 etc.) and open a file called KCY61710.output .

We print each line as we go to this output, closing and re-opening each time we hit one of those lines.

A pre-match file exists in case the first line(s) don't match this pattern.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM