简体   繁体   中英

How do I read / write a file in Python (3) on Windows without introducing carriage returns?

I want to open a file using Python on Windows, perform some regex operations, optionally alter the content and then write the result back to a file.

I can create an example file which looks right (based on the comments on using binary mode in other posts on SO and within the documentation). What I can't see is how I convert the 'binary' data to a usable form without introducing '\\r' characters.

An example:

import re

# Create an example file which represents the one I'm actually working on (a Jenkins config file if you're interested).
testFileName = 'testFile.txt'
with open(testFileName, 'wb') as output_file:
    output_file.write(b'this\nis\na\ntest')

# Try and read the file in as I would in the script I was trying to write.
content = ""
with open(testFileName, 'rb') as content_file:
    content = content_file.read()

# Do something to the content
exampleRegex = re.compile("a\\ntest")
content = exampleRegex.sub("a\\nworking\\ntest", content) # <-- Fails because it won't operate on 'binary data'

# Write the file back to disk and then realise, frustratingly that something in this process has introduced carriage returns onto every line.
outputFilename = 'output_'+testFileName
with open(outputFilename, 'wb') as output_file:
    output_file.write(content)

I presume you mean, your text file has return carriages and you don't want them included in the text.

If you use with open(fileName, 'r', encoding="utf-8", errors="ignore", newline="\\r\\n") as content_file

or more specifically, set newline="\\r\\n" in your open call, it should consume the return carriages on new lines.

Edit: Or if you want to operate only on \\n then this working example should do it.

import re

testFileName = 'testFile.txt'
with open(testFileName, 'w', newline='\n') as output_file:
    output_file.write('this\nis\na\ntest')

content = ""
with open(testFileName, 'r', newline='\n') as content_file:
    content = content_file.read()

exampleRegex = re.compile("a\\ntest")
content = exampleRegex.sub("a\\nworking\\ntest", content)

outputFilename = 'output_'+testFileName
with open(outputFilename, 'w', newline='\n') as output_file:
    output_file.write(content)

If I interpreted the question correctly, I first decoded the bytes to string, then did the regex sub. Next, I encoded the string into bytes to be written into the output file.

import re

testFileName = 'testFile.txt'
with open(testFileName, 'wb') as output_file:
    output_file.write(b'this\nis\na\ntest')

content = ""
with open(testFileName, 'rb') as content_file:
    content = content_file.read().decode('utf-8')

exampleRegex = re.compile("a\\ntest")
content = exampleRegex.sub("a\\nworking\\ntest", content)

outputFilename = 'output_'+testFileName
with open(outputFilename, 'wb') as output_file:
    output_file.write(content.encode('utf-8'))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM