How do I read / write a file in Python (3) on Windows without introducing carriage returns?

Question

I want to open a file using Python on Windows, perform some regex operations, optionally alter the content and then write the result back to a file.

I can create an example file which looks right (based on the comments on using binary mode in other posts on SO and within the documentation). What I can't see is how I convert the 'binary' data to a usable form without introducing '\\r' characters.

An example:

import re

# Create an example file which represents the one I'm actually working on (a Jenkins config file if you're interested).
testFileName = 'testFile.txt'
with open(testFileName, 'wb') as output_file:
    output_file.write(b'this\nis\na\ntest')

# Try and read the file in as I would in the script I was trying to write.
content = ""
with open(testFileName, 'rb') as content_file:
    content = content_file.read()

# Do something to the content
exampleRegex = re.compile("a\\ntest")
content = exampleRegex.sub("a\\nworking\\ntest", content) # <-- Fails because it won't operate on 'binary data'

# Write the file back to disk and then realise, frustratingly that something in this process has introduced carriage returns onto every line.
outputFilename = 'output_'+testFileName
with open(outputFilename, 'wb') as output_file:
    output_file.write(content)

Answer 1

I presume you mean, your text file has return carriages and you don't want them included in the text.

If you use with open(fileName, 'r', encoding="utf-8", errors="ignore", newline="\\r\\n") as content_file

or more specifically, set newline="\\r\\n" in your open call, it should consume the return carriages on new lines.

Edit: Or if you want to operate only on \\n then this working example should do it.

import re

testFileName = 'testFile.txt'
with open(testFileName, 'w', newline='\n') as output_file:
    output_file.write('this\nis\na\ntest')

content = ""
with open(testFileName, 'r', newline='\n') as content_file:
    content = content_file.read()

exampleRegex = re.compile("a\\ntest")
content = exampleRegex.sub("a\\nworking\\ntest", content)

outputFilename = 'output_'+testFileName
with open(outputFilename, 'w', newline='\n') as output_file:
    output_file.write(content)

Answer 2

If I interpreted the question correctly, I first decoded the bytes to string, then did the regex sub. Next, I encoded the string into bytes to be written into the output file.

import re

testFileName = 'testFile.txt'
with open(testFileName, 'wb') as output_file:
    output_file.write(b'this\nis\na\ntest')

content = ""
with open(testFileName, 'rb') as content_file:
    content = content_file.read().decode('utf-8')

exampleRegex = re.compile("a\\ntest")
content = exampleRegex.sub("a\\nworking\\ntest", content)

outputFilename = 'output_'+testFileName
with open(outputFilename, 'wb') as output_file:
    output_file.write(content.encode('utf-8'))

How do I read / write a file in Python (3) on Windows without introducing carriage returns?

Question

2 answers

solution1
2 ACCPTED 2015-09-14 14:46:49

solution2
1 2015-09-14 14:46:07

How do I read / write a file in Python (3) on Windows without introducing carriage returns?

Question

2 answers

solution1 2 ACCPTED 2015-09-14 14:46:49

solution2 1 2015-09-14 14:46:07

solution1
2 ACCPTED 2015-09-14 14:46:49

solution2
1 2015-09-14 14:46:07