简体   繁体   中英

Please help me optimize my Python code

Beginner here! Forgive me in advance for raising what is probably an incredibly simple problem.

I've been trying to put together a Python script that runs multiple find-and-replace actions and a few similar things on a specified plain-text file. It works, but from a programming perspective I doubt it works well. How would I best go about optimizing the actions made upon the 'outtext' variable? At the moment it's basically doing a very similar thing four times over...

import binascii
import re
import struct
import sys

infile = sys.argv[1]
charenc = sys.argv[2]
outFile=infile+'.tex'

findreplace = [
('TERM1', 'TERM2'),
('TERM3', 'TERM4'),
('TERM5', 'TERM6'),
]

inF = open(infile,'rb')
s=unicode(inF.read(),charenc)
inF.close()

# THIS IS VERY MESSY.
for couple in findreplace:
    outtext=s.replace(couple[0],couple[1])
    s=outtext

for couple in findreplace:
    outtext=re.compile('Title: (.*)', re.I).sub(r'\\title'+ r'{\1}', s)
    s=outtext

for couple in findreplace:
    outtext=re.compile('Author: (.*)', re.I).sub(r'\\author'+ r'{\1}', s)
    s=outtext

for couple in findreplace:
    outtext=re.compile('Date: (.*)', re.I).sub(r'\\date'+ r'{\1}', s)
    s=outtext
# END MESSY SECTION.

outF = open(outFile,'wb')
outF.write(outtext.encode('utf-8'))
outF.close()

For the last three replacements, the for-loop seems to be unnecessary, since the loop body doesn't use the loop variable. Also it is not necessary to assign the result to a temporary outtext variable, it can be directly assigned to s .

These changes make the code much more concise:

for couple in findreplace:
    s = s.replace(couple[0],couple[1])

s = re.compile('Title: (.*)', re.I).sub(r'\\title {\1}', s)
s = re.compile('Author: (.*)', re.I).sub(r'\\author {\1}', s)
s = re.compile('Date: (.*)', re.I).sub(r'\\date {\1}', s)

Some possibilities:

  • Do the re.compile outside the loop.
  • Use only one loop and do all four things inside it.
  • Use one regex, which matches on either Title, Author or Date.
  • Don't do the last three replaces inside a loop, since they do not use the couple variable.

your code is not so bad. If you want to optimise for speed then I would look into compiling your regex's at the start of the program.

Also, If you want to tidy up your code then chain all your subs together in one loop.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM