[英]How to extract specific values from a file and print them into another file
I have PDB files that are named by their PDB ID, for instance 2KRJ.pdb
. 我有以其PDB ID命名的PDB文件,例如
2KRJ.pdb
。 I want to extract from them only the lines that begin with ATOM
or HETATM
, and copy them into a new file with the same name and a .txt
extension, for instance 2KRJ.txt
. 我只想从中提取以
ATOM
或HETATM
开头的行,然后将它们复制到具有相同名称和扩展名.txt
的新文件中,例如2KRJ.txt
。
I know how to extract those lines but I am having trouble copying them to another file. 我知道如何提取这些行,但是将它们复制到另一个文件时遇到了麻烦。
This is the script that I have written so far for extracting: 到目前为止,这是我为提取而编写的脚本:
#!/usr/bin/perl -w
$dirname = '.';
opendir(DIR, $dirname) or die "cannot open directory";
@files = grep(/\.pdb$/,readdir(DIR));
foreach $files ( @files ) {
open (FH, $files) or die "could not open $files\n";
@file_each = <FH>;
#print @file_each;
#print "$file\n";
close FH;
#$dir_sz = scalar @files;
#print "$dir_sz\n";
close DIR;
my @ac = ();
my @dr = ();
my @os = ();
my @names = ();
my @ion_names = ();
my $flag = 0;
for ( my $line = 0; $line <= $#file_each; $line++ ) { # loop reading each line from the @file up to the end of file
chomp( $file_each[$line] );
if ( $file_each[$line] =~ /^HEADER/ ) {
my @id = split '\s+', $file_each[$line];
my $filename = pop @id;
$filename = "$filename.pdb";
while ( $file_each[$line] !~ /^END/ ) { # read the lines until you get the symbol 'END'
$line++;
if ( $file_each[$line] =~/^ATOM|^HETATM/ ) {
$file_each[$line] =~ s/^ATOM|^HETATM//;
@xyz = split '\s+', $file_each[$line];
chomp @xyz[0,6,7,8];
print join (':', @xyz), "\n";
push @coord, @xyz[0,6,7,8];
print "@coord\n";
}
open (OUTPUT, ">$filename.txt");
print(OUTPUT "@coord\n");
close OUTPUT;
}
}
}
}
The problem is that this script does not print the first column and the output is a bit disorganised, there are no four column of each row. 问题在于此脚本不会打印第一列,并且输出有些混乱,每行没有四列。
Lines that I am trying to extract look like this: 我尝试提取的行如下所示:
ATOM 946 OH TYR A 59 37.734 36.478 24.541 1.00 0.00 O
ATOM 947 H TYR A 59 33.478 35.320 18.896 1.00 0.00 H
And I am trying to change it so that the new text file script contains only this: 我正在尝试更改它,以便新的文本文件脚本仅包含以下内容:
ATOM 37.734 36.478 24.541
ATOM 33.478 35.320 18.896
But I am getting this 但是我得到这个
.326 2.859 229 -18.940 4.490 230 -23.744 0.422 230 -24.558 -0.785 230
-24.256 -1.547 230 -23.137 -2.012 230 -24.338 -1.681 230 -25.135 -2.969
230 -26.307 -2.940 230 -24.589 -4.016 230 -22.773 0.364 231 -25.257
-1.661 231 -25.103 -2.360 231 -26.141 -3.471 231 -27.309 -3.282 231
-25.252 -1.396
This will do as you ask 这将按照您的要求
Do you see how trying to hack an existing program leads you to write way too much code, and so increase the chances of a bug? 您是否看到尝试破解现有程序如何导致您编写太多代码,从而增加了出现错误的机会? Please learn to program in Perl and stop relying on freebies from generous souls
请学习在Perl中编程,并停止依赖慷慨灵魂中的免费赠品
use strict;
use warnings 'all';
use autodie;
for my $pdb ( glob '*.pdb' ) {
open my $fh, '<', $pdb;
my $out_fh;
while ( <$fh> ) {
next unless my @fields = split;
if ( $fields[0] eq 'HEADER' ) {
open $out_fh, '>', "$fields[-1].txt";
}
elsif ( $fields[0] eq 'ATOM' or $fields[0] eq 'HETATM' ) {
unless ( $out_fh ) {
warn qq{No ID found for file "$pdb"};
last;
}
print $out_fh "@fields[0,6,7,8]\n";
}
}
}
ATOM 15.200 27.271 13.911
ATOM 15.336 27.312 15.415
ATOM 16.364 26.299 15.932
ATOM 16.167 25.081 15.787
ATOM 14.019 26.968 16.088
ATOM 14.198 27.038 17.607
ATOM 13.515 25.568 15.575
ATOM 14.524 28.415 18.088
ATOM 17.456 26.771 16.532
ATOM 18.424 25.815 17.028
ATOM 19.122 26.165 18.302
ATOM 19.066 27.314 18.764
...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.