[英]How to extract specific values from a file and print them into another file

I have PDB files that are named by their PDB ID, for instance 2KRJ.pdb . 我有以其PDB ID命名的PDB文件,例如2KRJ.pdb I want to extract from them only the lines that begin with ATOM or HETATM , and copy them into a new file with the same name and a .txt extension, for instance 2KRJ.txt . 我只想从中提取以ATOMHETATM开头的行,然后将它们复制到具有相同名称和扩展名.txt的新文件中,例如2KRJ.txt

I know how to extract those lines but I am having trouble copying them to another file. 我知道如何提取这些行,但是将它们复制到另一个文件时遇到了麻烦。

This is the script that I have written so far for extracting: 到目前为止,这是我为提取而编写的脚本:

#!/usr/bin/perl -w

$dirname = '.';
opendir(DIR, $dirname) or die "cannot open directory";
@files = grep(/\.pdb$/,readdir(DIR));

foreach $files ( @files ) {

    open (FH, $files) or die "could not open $files\n";
    @file_each = <FH>;
    #print @file_each;
    #print "$file\n";
    close FH;

    #$dir_sz = scalar @files;
    #print "$dir_sz\n";
    close DIR;

    my @ac        = ();
    my @dr        = ();
    my @os        = ();
    my @names     = ();
    my @ion_names = ();
    my $flag      = 0;

    for ( my $line = 0; $line <= $#file_each; $line++ ) {  # loop reading each line from the @file up to the end of file  

        chomp( $file_each[$line] );

        if ( $file_each[$line] =~ /^HEADER/ ) {

            my @id       = split '\s+', $file_each[$line];
            my $filename = pop @id;
            $filename    = "$filename.pdb";

            while ( $file_each[$line] !~ /^END/ ) { # read the lines until you get the symbol 'END'


                if ( $file_each[$line] =~/^ATOM|^HETATM/ ) {

                    $file_each[$line] =~ s/^ATOM|^HETATM//;

                    @xyz = split '\s+', $file_each[$line];
                    chomp @xyz[0,6,7,8];
                    print join (':', @xyz), "\n";

                    push @coord, @xyz[0,6,7,8];
                    print "@coord\n";

                open (OUTPUT, ">$filename.txt"); 
                print(OUTPUT "@coord\n"); 
                close OUTPUT;

The problem is that this script does not print the first column and the output is a bit disorganised, there are no four column of each row. 问题在于此脚本不会打印第一列,并且输出有些混乱,每行没有四列。

Lines that I am trying to extract look like this: 我尝试提取的行如下所示:

ATOM    946  OH  TYR A  59      37.734  36.478  24.541  1.00  0.00           O  
ATOM    947  H   TYR A  59      33.478  35.320  18.896  1.00  0.00           H  

And I am trying to change it so that the new text file script contains only this: 我正在尝试更改它,以便新的文本文件脚本仅包含以下内容:

ATOM   37.734  36.478  24.541          
ATOM   33.478  35.320  18.896 

But I am getting this 但是我得到这个

 .326 2.859  229 -18.940 4.490  230 -23.744 0.422  230 -24.558 -0.785  230  
 -24.256 -1.547  230 -23.137 -2.012  230 -24.338 -1.681  230 -25.135 -2.969   
 230 -26.307 -2.940  230 -24.589 -4.016  230 -22.773 0.364  231 -25.257   
-1.661  231 -25.103 -2.360  231 -26.141 -3.471  231 -27.309 -3.282  231   
-25.252 -1.396  

This will do as you ask 这将按照您的要求

Do you see how trying to hack an existing program leads you to write way too much code, and so increase the chances of a bug? 您是否看到尝试破解现有程序如何导致您编写太多代码,从而增加了出现错误的机会? Please learn to program in Perl and stop relying on freebies from generous souls 请学习在Perl中编程,并停止依赖慷慨灵魂中的免费赠品

use strict;
use warnings 'all';
use autodie;

for my $pdb ( glob '*.pdb' ) {

    open my $fh, '<', $pdb;
    my $out_fh;

    while ( <$fh> ) {
        next unless my @fields = split;

        if ( $fields[0] eq 'HEADER' ) {
            open $out_fh, '>', "$fields[-1].txt";
        elsif ( $fields[0] eq 'ATOM' or $fields[0] eq 'HETATM' ) {

            unless ( $out_fh ) {
                warn qq{No ID found for file "$pdb"};

            print $out_fh "@fields[0,6,7,8]\n";

output 产量

ATOM 15.200 27.271 13.911
ATOM 15.336 27.312 15.415
ATOM 16.364 26.299 15.932
ATOM 16.167 25.081 15.787
ATOM 14.019 26.968 16.088
ATOM 14.198 27.038 17.607
ATOM 13.515 25.568 15.575
ATOM 14.524 28.415 18.088
ATOM 17.456 26.771 16.532
ATOM 18.424 25.815 17.028
ATOM 19.122 26.165 18.302
ATOM 19.066 27.314 18.764


