简体   繁体   中英

How to input from input direcory folder and save output file in same name as input file in output folder in python

I want to create input directory for my code will take input files from input directory and save same name as input file in output (different folder) directory.

Script:

import sys
import glob
import errno
import os


d = {}
chainIDs = ('A', 'B')
atomIDs = ('C4B', 'O4B', 'C1B', 'C2B', 'C3B', 'C4B', 'O4B', 'C1B')
count = 0
for doc in os.listdir('/C:/Users/Vishnu/Desktop/Test_folder/Input'):
doc1 = "doc_path" + doc
doc2 = "/C:/Users/Vishnu/Desktop/Test_folder/Output" + doc1
if doc1.endswith(".pdb"):
with open(doc) as pdbfile:
       single_line = ''.join([line for line in f])
       single_space = ' '.join(single_line.split())
       for line in map(str.rstrip, pdbfile):
            if line[:6] != "HETATM":
                continue
            chainID = line[21:22]
            atomID = line[13:16].strip()
            if chainID not in chainIDs:
                continue
            if atomID not in atomIDs:
                continue
            try:
                d[chainID][atomID] = line
            except KeyError:
                d[chainID] = {atomID: line}

    n = 4
    for chainID in chainIDs:
        for i in range(len(atomIDs)-n+1):
            for j in range(n):
                   with open(doc2.format(count) , "w") as doc2:
                         doc2.write(d[chainID][atomIDs[i+j]])
                         count += 1   

else:
continue

Below error while running above code, I am new in python, just learning, can anyone please help? error:

with open(doc) as pdbfile:
    ^
IndentationError: expected an indented block
>>> 

Input file:

HETATM15207  C4B NAD A 501      47.266 101.038   7.214  1.00 11.48           C  
HETATM15208  O4B NAD A 501      46.466 100.713   8.371  1.00 11.48           O  
HETATM15209  C3B NAD A 501      47.659  99.689   6.567  1.00 11.48           C  
HETATM15211  C2B NAD A 501      46.447  98.835   6.988  1.00 11.48           C  
HETATM15213  C1B NAD A 501      46.221  99.300   8.426  1.00 11.48           C  
HETATM15252  C4B NAD B 501      36.455 115.053  36.671  1.00 11.25           C  
HETATM15253  O4B NAD B 501      35.930 114.469  35.492  1.00 11.25           O  
HETATM15254  C3B NAD B 501      35.307 115.837  37.367  1.00 11.25           C  
HETATM15256  C2B NAD B 501      34.172 114.876  37.039  1.00 11.25           C  
HETATM15258  C1B NAD B 501      34.524 114.613  35.551  1.00 11.25           C  
HETATM15297  C4B NAD C 501      98.229 130.106  18.332  1.00 12.28           C  
HETATM15298  O4B NAD C 501      98.083 131.545  18.199  1.00 12.28           O  
HETATM15299  C3B NAD C 501      99.346 129.675  17.343  1.00 12.28           C  
HETATM15301  C2B NAD C 501     100.220 130.922  17.375  1.00 12.28           C  
HETATM15303  C1B NAD C 501      99.125 132.008  17.317  1.00 12.28           C  
HETATM15342  C4B NAD D 501      77.335 156.939  25.788  1.00 11.99           C  
HETATM15343  O4B NAD D 501      78.705 156.544  25.901  1.00 11.99           O  
HETATM15344  C3B NAD D 501      77.106 158.059  26.824  1.00 11.99           C  
HETATM15346  C2B NAD D 501      78.536 158.632  26.878  1.00 11.99           C  
HETATM15348  C1B NAD D 501      79.351 157.345  26.900  1.00 11.99           C  

Col 2 is Residue name, Col 4 is A, B, C, D is the chain ID:

Expected Output for each chain ID (A, B ..... Z) Chain ID may be A to Z but mostly A to H:

for A chain:

HETATM15207  C4B NAD A 501      47.266 101.038   7.214  1.00 11.48           C 
HETATM15208  O4B NAD A 501      46.466 100.713   8.371  1.00 11.48           O  
HETATM15213  C1B NAD A 501      46.221  99.300   8.426  1.00 11.48           C  
HETATM15211  C2B NAD A 501      46.447  98.835   6.988  1.00 11.48           C   

HETATM15208  O4B NAD A 501      46.466 100.713   8.371  1.00 11.48           O  
HETATM15213  C1B NAD A 501      46.221  99.300   8.426  1.00 11.48           C  
HETATM15211  C2B NAD A 501      46.447  98.835   6.988  1.00 11.48           C  
HETATM15209  C3B NAD A 501      47.659  99.689   6.567  1.00 11.48           C  

HETATM15213  C1B NAD A 501      46.221  99.300   8.426  1.00 11.48           C  
HETATM15211  C2B NAD A 501      46.447  98.835   6.988  1.00 11.48           C  
HETATM15209  C3B NAD A 501      47.659  99.689   6.567  1.00 11.48           C  
HETATM15207  C4B NAD A 501      47.266 101.038   7.214  1.00 11.48           C  

HETATM15211  C2B NAD A 501      46.447  98.835   6.988  1.00 11.48           C  
HETATM15209  C3B NAD A 501      47.659  99.689   6.567  1.00 11.48           C  
HETATM15207  C4B NAD A 501      47.266 101.038   7.214  1.00 11.48           C  
HETATM15208  O4B NAD A 501      46.466 100.713   8.371  1.00 11.48           O  

HETATM15209  C3B NAD A 501      47.659  99.689   6.567  1.00 11.48           C  
HETATM15207  C4B NAD A 501      47.266 101.038   7.214  1.00 11.48           C  
HETATM15208  O4B NAD A 501      46.466 100.713   8.371  1.00 11.48           O  
HETATM15213  C1B NAD A 501      46.221  99.300   8.426  1.00 11.48           C  

The IndentationError is showing up because you seem to have indented two tabs underneath your with open(doc) as pdbfile: line.

Hope this helps!

import sys
import glob
import errno
import os


d = {}
chainIDs = ('A', 'B')
atomIDs = ('C4B', 'O4B', 'C1B', 'C2B', 'C3B', 'C4B', 'O4B', 'C1B')
count = 0
doc_path=r'C:\Users\Vishnu\Desktop\Test_folder\Input'
tar_path=r'C:\Users\Vishnu\Desktop\Test_folder\Output'
for doc in os.listdir(doc_path):
    doc1 = doc_path+'\\'+ doc
    doc2 = tar_path+'\\'+ doc

    if doc1.endswith(".pdb"):
        print(doc1,doc2)
        with open(doc1) as pdbfile:
           # single_line = ''.join([line for line in f])
           # single_space = ' '.join(single_line.split())
           for line in map(str.rstrip, pdbfile):
                if line[:6] != "HETATM":
                    continue
                chainID = line[21:22]
                atomID = line[13:16].strip()
                if chainID not in chainIDs:
                    continue
                if atomID not in atomIDs:
                    continue
                try:
                    d[chainID][atomID] = line
                except KeyError:
                    d[chainID] = {atomID: line}
           n = 4
           for chainID in chainIDs:
               for i in range(len(atomIDs)-n+1):
                   for j in range(n):
                          with open(doc2 , "w+") as s:
                                s.write(d[chainID][atomIDs[i+j]])
                                count += 1   

    else:
        continue

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM