简体   繁体   English

我想在 perl 或 python 中对 pdb 文件中的数据进行排序

[英]I want to sort data from pdb file in perl or python

I want to print sequence of Ribose Puckering.我想打印核糖起皱的序列。

Script in perl: perl 中的脚本:

   open (filehandler, "List_NAD_ID.txt") or die $!; #Input file
   my @file1=<filehandler>;

   my $OutputDir = 'C:\Users\result'; #output directory path

   foreach my $line (@file1)
   {
       chomp $line; 
       open (fh,"$line") or die $!;
       open (out, ">$OutputDir/$line.pdb") or die $!;           
       print out "\n" , "$line  ";  
       print out "\n";

       while($file = <fh>)
       {


            if($file =~/^HETATM.{7}(?:C4B|O4B|C1B|C2B|O4B|C1B|C2B|C3B|C1B|C2B|C3B|C4B|C2B|C3B|C4B|O4B|C3B|C4B|O4B|C1B)/)  
            {

                print out "$file";
            }
       }
       print "Completed", "\n";
  }

I have pdb input file:我有 pdb 输入文件:

 HETATM 3934  C4B NAD A 255      10.495 -11.444   1.016  1.00 50.46           C  
 HETATM 3935  O4B NAD A 255      10.768 -11.615   2.448  1.00 48.17           O  
 HETATM 3936  C3B NAD A 255      10.445 -12.867   0.431  1.00 49.69           C  
 HETATM 3938  C2B NAD A 255      10.431 -13.759   1.675  1.00 48.46           C  
 HETATM 3940  C1B NAD A 255      11.323 -12.898   2.593  1.00 46.97           C  
 HETATM 3978  C4B NAD B 256      14.596   1.733  33.219  1.00 50.48           C  
 HETATM 3979  O4B NAD B 256      14.370   0.578  32.357  1.00 48.22           O  
 HETATM 3980  C3B NAD B 256      14.940   1.177  34.603  1.00 49.64           C  
 HETATM 3982  C2B NAD B 256      14.987  -0.347  34.401  1.00 48.48           C  
 HETATM 3984  C1B NAD B 256      14.066  -0.517  33.189  1.00 46.98           C  

Expected Result:预期结果:

I want to copy following atom and then paste as per following sequence.我想复制以下原子,然后按照以下顺序粘贴。 All should be chain wise.一切都应该是连锁明智的。 (Chain "A, B, C,..........") (链“A、B、C…………”)

 HETATM 3934  **C4B** NAD **A** 255      10.495 -11.444   1.016  1.00 50.46           C  
 HETATM 3935  **O4B** NAD **A** 255      10.768 -11.615   2.448  1.00 48.17           O
 HETATM 3938  **C2B** NAD **A** 255      10.431 -13.759   1.675  1.00 48.46           C  
 HETATM 3940  **C1B** NAD **A** 255      11.323 -12.898   2.593  1.00 46.97           C    
 HETATM 3935  **O4B** NAD **A** 255      10.768 -11.615   2.448  1.00 48.17           O  
 HETATM 3940  **C1B** NAD **A** 255      11.323 -12.898   2.593  1.00 46.97           C  
 HETATM 3938  **C2B** NAD **A** 255      10.431 -13.759   1.675  1.00 48.46           C  
 HETATM 3936  **C3B** NAD **A** 255      10.445 -12.867   0.431  1.00 49.69           C 
 .
 .
 .

I have five level of paste sequence, v0,v1,v2,v3,v4.我有五个级别的粘贴序列,v0、v1、v2、v3、v4。

Sequence is:顺序是:

C4B-O4B-C1B-C2B
O4B-C1B-C2B-C3B
C1B-C2B-C3B-C4B
C2B-C3B-C4B-O4B
C3B-C4B-O4B-C1B

This all sequence, I want to print data as per above sequence.这所有序列,我想按照上述序列打印数据。 I have also edited expected result.我还编辑了预期结果。

I want to sort data as per above sequence, chain wise.我想按照上述顺序对数据进行链式排序。 I am not getting expected result.我没有得到预期的结果。 I have tried in perl.我在 perl 中尝试过。 I am new in perl and python... so please try to solve my problem我是 perl 和 python 新手...所以请尝试解决我的问题

Its Like matrix problem:它的Like矩阵问题:

for example we have five values: 1,2,3,4,5例如我们有五个值:1,2,3,4,5

Row 1 - 1  2  3  4  
Row 2 - 2  3  4  5 
Row 3 - 3  4  5  1
Row 4 - 4  5  1  2 

I want to print data like that for each chain.我想为每个链打印这样的数据。 Chain A to Z.链 A 到 Z。

If you want to use Biopython, you have to create all the Chains and insert the Atoms in it.如果你想使用Biopython,你必须创建所有的并将原子插入其中。 But the atoms must be hold in a Residue for this to work out:但是原子必须保持在一个 残基中才能解决这个问题:

from Bio.PDB import PDBParser, PDBIO, Chain, Residue

# This is your source structure
pdb = PDBParser().get_structure("UGLY", "ugly.pdb")

# Now you cycle all your chains
for chain in pdb.get_chains():
    # Load all the atoms and residues in each Chain
    atoms = list(chain.get_atoms())
    residues = list(chain.get_residues())

    # Start a new structure to save the output
    io = PDBIO()
    this_chain = Chain.Chain("A")
    this_residue = Residue.Residue(residues[0].id,
                                   residues[0].resname,
                                   residues[0].segid)

    # Now get the atoms in your source structure that matches your sort keys
    # You should refactor this out to a function that accepts a sort key
    #  and returns a list of atoms or a residue with the atoms added.
    for atom_name in "O4B-C1B-C2B-C3B".split("-"):
        for atom in atoms:
            if atom.get_name() == atom_name:
                this_residue.add(atom)

    # Add the residue to a structure and save it
    this_chain.add(this_residue)
    io.set_structure(this_chain)
    # And now write your output file. Remember to change the name!
    io.save("temp.pdb")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM