简体   繁体   English

如何从文件中提取特定值并将其打印到另一个文件中

[英]How to extract specific values from a file and print them into another file

I have PDB files that are named by their PDB ID, for instance 2KRJ.pdb . 我有以其PDB ID命名的PDB文件,例如2KRJ.pdb I want to extract from them only the lines that begin with ATOM or HETATM , and copy them into a new file with the same name and a .txt extension, for instance 2KRJ.txt . 我只想从中提取以ATOMHETATM开头的行,然后将它们复制到具有相同名称和扩展名.txt的新文件中,例如2KRJ.txt

I know how to extract those lines but I am having trouble copying them to another file. 我知道如何提取这些行,但是将它们复制到另一个文件时遇到了麻烦。

This is the script that I have written so far for extracting: 到目前为止,这是我为提取而编写的脚本:

#!/usr/bin/perl -w

$dirname = '.';
opendir(DIR, $dirname) or die "cannot open directory";
@files = grep(/\.pdb$/,readdir(DIR));

foreach $files ( @files ) {

    open (FH, $files) or die "could not open $files\n";
    @file_each = <FH>;
    #print @file_each;
    #print "$file\n";
    close FH;

    #$dir_sz = scalar @files;
    #print "$dir_sz\n";
    close DIR;

    my @ac        = ();
    my @dr        = ();
    my @os        = ();
    my @names     = ();
    my @ion_names = ();
    my $flag      = 0;

    for ( my $line = 0; $line <= $#file_each; $line++ ) {  # loop reading each line from the @file up to the end of file  

        chomp( $file_each[$line] );

        if ( $file_each[$line] =~ /^HEADER/ ) {

            my @id       = split '\s+', $file_each[$line];
            my $filename = pop @id;
            $filename    = "$filename.pdb";

            while ( $file_each[$line] !~ /^END/ ) { # read the lines until you get the symbol 'END'

                $line++;

                if ( $file_each[$line] =~/^ATOM|^HETATM/ ) {

                    $file_each[$line] =~ s/^ATOM|^HETATM//;

                    @xyz = split '\s+', $file_each[$line];
                    chomp @xyz[0,6,7,8];
                    print join (':', @xyz), "\n";

                    push @coord, @xyz[0,6,7,8];
                    print "@coord\n";
                }

                open (OUTPUT, ">$filename.txt"); 
                print(OUTPUT "@coord\n"); 
                close OUTPUT;
            }
        }
    }
}

The problem is that this script does not print the first column and the output is a bit disorganised, there are no four column of each row. 问题在于此脚本不会打印第一列,并且输出有些混乱,每行没有四列。

Lines that I am trying to extract look like this: 我尝试提取的行如下所示:

ATOM    946  OH  TYR A  59      37.734  36.478  24.541  1.00  0.00           O  
ATOM    947  H   TYR A  59      33.478  35.320  18.896  1.00  0.00           H  

And I am trying to change it so that the new text file script contains only this: 我正在尝试更改它,以便新的文本文件脚本仅包含以下内容:

ATOM   37.734  36.478  24.541          
ATOM   33.478  35.320  18.896 

But I am getting this 但是我得到这个

 .326 2.859  229 -18.940 4.490  230 -23.744 0.422  230 -24.558 -0.785  230  
 -24.256 -1.547  230 -23.137 -2.012  230 -24.338 -1.681  230 -25.135 -2.969   
 230 -26.307 -2.940  230 -24.589 -4.016  230 -22.773 0.364  231 -25.257   
-1.661  231 -25.103 -2.360  231 -26.141 -3.471  231 -27.309 -3.282  231   
-25.252 -1.396  

This will do as you ask 这将按照您的要求

Do you see how trying to hack an existing program leads you to write way too much code, and so increase the chances of a bug? 您是否看到尝试破解现有程序如何导致您编写太多代码,从而增加了出现错误的机会? Please learn to program in Perl and stop relying on freebies from generous souls 请学习在Perl中编程,并停止依赖慷慨灵魂中的免费赠品

use strict;
use warnings 'all';
use autodie;

for my $pdb ( glob '*.pdb' ) {

    open my $fh, '<', $pdb;
    my $out_fh;

    while ( <$fh> ) {
        next unless my @fields = split;

        if ( $fields[0] eq 'HEADER' ) {
            open $out_fh, '>', "$fields[-1].txt";
        }
        elsif ( $fields[0] eq 'ATOM' or $fields[0] eq 'HETATM' ) {

            unless ( $out_fh ) {
                warn qq{No ID found for file "$pdb"};
                last;
            }

            print $out_fh "@fields[0,6,7,8]\n";
        }
    }
}

output 产量

ATOM 15.200 27.271 13.911
ATOM 15.336 27.312 15.415
ATOM 16.364 26.299 15.932
ATOM 16.167 25.081 15.787
ATOM 14.019 26.968 16.088
ATOM 14.198 27.038 17.607
ATOM 13.515 25.568 15.575
ATOM 14.524 28.415 18.088
ATOM 17.456 26.771 16.532
ATOM 18.424 25.815 17.028
ATOM 19.122 26.165 18.302
ATOM 19.066 27.314 18.764
...

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从文本文件中提取值并使用 Bash 重新排列它们? - How to extract values from a text file and rearrange them using Bash? 从文本文件中提取所有数字并将其存储在另一个文件中 - Extract all numbers from a text file and store them in another file 如何从文件中的不同行中提取单词并将它们组合到 bash shell 脚本中的另一个文件 - How to get extract words from differents lines in file and combine them to another file in bash shell script 从txt文件中提取JSON值,然后将其写入并以逗号分隔 - Extract a JSON values from a txt file and write them seperated by comma 从文件行中提取特定字符串并输出到具有修改的另一个文件 - Extract specific strings from line in file and output to another file with modifications 如何从文件中提取特定行并将其附加到Shell脚本中的另一个现有文件中,然后从原始文件中删除? - how to extract specific lines from file and append it to another existing file in shell script and then delete from original? 如何读取提取括号之间的数据并将其打印到另一个文件 - How to read extract data between parantheses and print it to another file Shell脚本从文件读取值并将它们与另一个值进行比较 - Shell script to read values from a file and to compare them with another value 如何从XML文件中提取它们之间的所有标签和内容? - How to extract all tags and content between them from the XML file? 从文件中提取特定的字符串 - Extract specific strings from file
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM