简体   繁体   English

如何使用perl将一个文件拆分为多个文件?

[英]How to split one file into multiple files using perl?

I have the file as ftp.txt which contains many versions of lines such as我的文件为 ftp.txt,其中包含许多版本的行,例如

>KCY60942 pep:novel supercontig:GCA_000682575.1:ab248605.contig.36_1:19:588:-1 gene:J738_3590 transcript:KCY60942 description:"putative transposase 1"
MTHLNELYLILNKYLKWNKSHLKCFALIMLVIILKQTCNLSSASKALPIKCLPQSFYRRM
QRFFAGQYFDYRQISQLIFNMFSFDQVQLTLDRTNWKWGKRNINILMLAIVYRGIAIPIL
WTLLNKRGNSDTKERIALIQRFIAIFGKDRIVNVFADREFIGEQWFTWLIEQDINFCIRV
KKTSLSPII

>KCY61710 pep:novel supercontig:GCA_000682575.1:ab248605.contig.22_1:4164:6320:1 gene:J738_2986 transcript:KCY61710 description:"tonB-dependent siderophore receptor family protein"
MQRTTKHFQINALALAIAMSTISAHAETDQQTSEYGTLPTIKVKAGSGQENEKSYIAGKT
DTAVPLGLSVREVPQSVSVITQQRLQDQQLSTLVEVAENVTGVSVNRYETNRGGIYSRGF
VVDNYIIDGIPTTYSLPWSSGEIFSSMALYDHIDVVRGATGLTFGAGNPSAAINMVRKRA
TSTEPTANVEVSAGSWDNYRVMGDIANSLNQSGTVRGRAVAQYEQGDSYTDLLSKEKLSL
LLSAEADLSENTLLSGGVTYQEDDPRGPMWGGLPVWFSDGTKTNWSKNITTSADWTRWNV
KYTNLFADLTHKFNDNWSAKLSYSHGKRDANSKLLYVSGSVDKNTGLGLSPYASAYDLEV
EQDNASLQLNGSFDLWGLEQKVVLGYQYSNQDFTAYARSTDTKMEIGNFFEWNGSMPEPV
WNAPTLNEKYNIEQNALFAATYLNPIEPLKFILGGRFTNYEKNIYGRSSSIKYDHEFVPY
AGIIYDFNDVYTAYASYTSIFQPQDKKDFDGNYLDPVEGNSTEVGLKSAWFDGRLNGTLA
LYHIKQDNLAQEAGDVTRNGVKEIYYRAAKGATSEGFEVEVSGQITPDWNITAGYSQFSA
KDTNDVDVNTQLPRKMIQTFTTYKLSGKLENITVGGGVNWQSSTYINAENPKEVIEKVEQ
GDYALVNLMARYQITKDFSAQLNINNVFDKKYYGVFPAYGQITLGAPRNAALTLQYKF

my query is to separate each version and want to save it each version with different file names?我的查询是将每个版本分开并希望使用不同的文件名保存每个版本? i tried the below code but i get only the line which startsup我尝试了下面的代码,但我只得到了启动的那一行

#!/usr/local/bin/perl
open( FILE, "/home/httpd/cgi-bin/r/ftp.txt" );
while ( $line = <FILE> ) {
    if ( $line =~ m/^\>/g ) {
        print $line;
    }
}

my desired output should be those two different versions which starts as like this >KCY60942 and >KCY61710 must saved in different filenames such as >KCY60942 should be saved in one file name and >KCY61710 it should be saved in another file name.我想要的输出应该是这两个不同的版本,它们以这样的方式开始

Here's another option:这是另一种选择:

use strict;
use warnings;

local $/ = '';

while (<>) {
    my ($fileName) = /^>([^\s]+)/;
    open my $fh, '>', "$fileName.txt" or die "Can't write to '$fileName.txt'";
    print $fh $_;
    close $fh;
}

Usage: perl script.pl inFile用法: perl script.pl inFile

Since each (FASTA?) record is a paragraph, $/ is set to empty ( '' ) to read the file in paragraph mode--one 'record' at a time.由于每个 (FASTA?) 记录都是一个段落,因此$/设置为空 ( '' ) 以在段落模式下读取文件——一次一个“记录”。 Each record's id is captured for use as that record's file name, and then that record is written to its file.捕获每个记录的 id 以用作该记录的文件名,然后将该记录写入其文件。

Hope this helps!希望这可以帮助!

Something like this should do the trick:像这样的事情应该可以解决问题:

#!/usr/local/bin/perl

use strict;
use warnings;

open( my $file, "<", "/home/httpd/cgi-bin/r/ftp.txt" );
open( my $output, ">", "pre-match" ) or die $!;

while ( my $line = <$file> ) {
    if ( $line =~ m/^\>/g ) {
        my ($output_name) = ( $line =~ m/^\>(\w+)/ );
        close($output);
        open( $output, ">", $output_name . ".output" ) or die $!;
    }
    print {$output} $line;
}

close($output);

If your line matches that regular expression, we 'pick out' the first word (so KCY61710 etc.) and open a file called KCY61710.output .如果您的行与该正则表达式匹配,我们“挑选”第一个单词( KCY61710等)并打开一个名为KCY61710.output的文件。

We print each line as we go to this output, closing and re-opening each time we hit one of those lines.当我们进入这个输出时,我们打印每一行,每次我们点击其中一行时关闭并重新打开。

A pre-match file exists in case the first line(s) don't match this pattern.如果第一行与此模式不匹配,则存在pre-match文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM