简体   繁体   中英

python, string.replace() and \n

(Edit: the script seems to work for others here trying to help. Is it because I'm running python 2.7? I'm really at a loss...)

I have a raw text file of a book I am trying to tag with pages.

Say the text file is:

some words on this line,
1
DOCUMENT TITLE some more words here too.
2
DOCUMENT TITLE and finally still more words.

I am trying to use python to modify the example text to read:

some words on this line,
</pg>
<pg n=2>some more words here too,
</pg>
<pg n=3>and finally still more words.

My strategy is to load the text file as a string. Build search-for and a replace-with strings corresponding to a list of numbers. Replace all instances in string, and write to a new file.

Here is the code I've written:

from sys import argv
script, input, output = argv

textin = open(input,'r')
bookstring = textin.read()
textin.close()

pages = []
x = 1
while x<400:
    pages.append(x)
    x = x + 1

pagedel = "DOCUMENT TITLE"

for i in pages:
    pgdel = "%d\n%s" % (i, pagedel)
    nplus = i + 1
    htmlpg = "</p>\n<p n=%d>" % nplus
    bookstring = bookstring.replace(pgdel, htmlpg)

textout = open(output, 'w')
textout.write(bookstring)
textout.close()

print "Updates to %s printed to %s" % (input, output)

The script runs without error, but it also makes no changes whatsoever to the input text. It simply reprints it character for character.

Does my mistake have to do with the hard return? \\n? Any help greatly appreciated.

In python, strings are immutable, and thus replace returns the replaced output instead of replacing the string in place.

You must do:

bookstring = bookstring.replace(pgdel, htmlpg)

You've also forgot to call the function close() . See how you have textin.close ? You have to call it with parentheses, like open:

textin.close()

Your code works for me, but I might just add some more tips:

  • Input is a built-in function, so perhaps try renaming that. Although it works normally, it might not for you.

  • When running the script, don't forget to put the .txt ending:

    • $ python myscript.py file1.txt file2.txt
  • Make sure when testing your script to clear the contents of file2 .

I hope these help!

Here's an entirely different approach that uses re (import the re module for this to work):

doctitle = False
newstr = ''
page = 1

for line in bookstring.splitlines():
    res = re.match('^\\d+', line)
    if doctitle:
        newstr += '<pg n=' + str(page) + '>' + re.sub('^DOCUMENT TITLE ', '', line)
        doctitle = False
 elif res:
     doctitle = True
     page += 1
    newstr += '\n</pg>\n'
 else:
    newstr += line

print newstr

Since no one knows what's going on, it's worth a try.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM