My lines in the text file look like this :
data/processed/10/blueprint-0.png,1915.0,387.0,1933.0,402.0
data/processed/10/blueprint-0.png,3350.0,387.0,3353.0,388.0
The number at 1915
location should always be smaller than the element at 1933
location, and the element at 387
location should always be smaller than the element at 402
location.
Unfortunately it isn't always the case as my data isn't perfectly clean. To fix that, I want to create another file where I just copy the line if it's correct and do the necessary adjustment fixing it in the new file if it isn't (I don't want to manipulate the data in the original file).
My code:
path = 'data/faulty.txt'
with open(path ) as f:
with open('data/true_values.txt', 'a') as the_file:
for line in f:
numbers = re.findall(r'\d+', line)
if numbers:
if numbers[2] > numbers[6]:
temp = numbers[2]
numbers[2] = numbers[6]
numbers[6] = temp
if numbers[4] > numbers[8]:
temp = numbers[2]
numbers[2] = numbers[6]
numbers[6] = temp
the_file.write(line)
How can I make the change into the line? I also thought about using re.sub
but couldn't manage to make it work.
example without re
:
input_filename = 'full_path_to_my_input_file.txt'
output_filename = 'full_path_to_my_output_file.txt'
with open(output_filename, 'a') as f_out:
with open(input_filename, 'r') as f_in:
for line in f_in:
records = line.strip().split(',')
if float(records[1]) > float(records[3]):
records[1], records[3] = records[3], records[1]
if float(records[2]) > float(records[4]):
records[2], records[4] = records[4], records[2]
f_out.write(','.join(records) + '\n')
input:
data/processed/10/blueprint-0.png,1915.0,387.0,1933.0,402.0
data/processed/10/blueprint-0.png,3353.0,389.0,3350.0,388.0
data/processed/10/blueprint-0.png,952.0,724.0,1010.0,734.0
output:
data/processed/10/blueprint-0.png,1915.0,387.0,1933.0,402.0
data/processed/10/blueprint-0.png,3350.0,388.0,3353.0,389.0 ## swapped !!
data/processed/10/blueprint-0.png,952.0,724.0,1010.0,734.0
I'd write the modified lines to a list as you go, then write the list out to a file at the end. that way you're not holding both files open while you process the first, which makes it more atomic an operation. Have also fixed to make the regex hand'e floats which I missed earlier.
import re
input = "data/faulty.txt"
output = "data/true_values.txt"
new = []
with open(input) as f:
for line in f:
name, numberstr = line.split(',', 1)
numbers = re.findall(r'\d+\.\d+|\d+', numberstr)
if numbers:
if numbers[0] > numbers[2]:
numbers[0], numbers[2] = numbers[2], numbers[0]
if numbers[1] > numbers[3]:
numbers[1], numbers[3] = numbers[3], numbers[1]
new.append("{},{}".format(name, ','.join(numbers)))
with open(output, 'a') as the_file:
for x in new:
the_file.write(x + '\n')
i believe its possible without using re
try running this
with open(path) as f, open('output.txt', 'w') as outputFile:
for line in f:
lineArr = line.split(",")
if float(lineArr[1])>float(lineArr[3]):
lineArr[1], lineArr[3] = lineArr[3].replace("\n",""), lineArr[1].replace("\n","")
if float(lineArr[2])>float(lineArr[4]):
lineArr[2], lineArr[4] = lineArr[4].replace("\n",""), lineArr[2].replace("\n","")
lineArr.append("\n")
outputFile.write(",".join(lineArr))
Don't use regexps when they're not the right tool. You obviously have a csv format, so use the csv module. Also, you need to convert your "numbers" to actual numbers - what you read in are strings not numbers. And finally, once you have parsed and possibly fixed a line, you have to recreate the new one from "fixed" values before you write it back:
# XXX untested code, may contains typos or small bugs
import csv
inpath = 'data/faulty.txt'
outpath = 'data/true_values.txt'
with open(inpath) as infile, open(outpath, 'a') as outpath:
# please check the csv doc for the correct options for your file format
reader = csv.reader(infile, delim=",")
writer = csv.writer(outfile, delim=",")
for row in reader:
# split the path from the numbers
imagepath, nums = row[0], row[1:]
# convert numbers to floats so we have
# meaningful comparisons
nums = [float(num) for num in nums]
# swap the numbers if necessary
if nums[0] > nums[2]:
nums[2], nums[0] = nums[0], nums[2]
if nums[1] > nums[3]:
nums[3], nums[1] = nums[1], nums[3]
# recreate the fixed row and write it
newrow = [imagepath] + nums
writer.writerow(newrow)
path = 'data/faulty.txt'
with open(path, "r") as f, open('data/true_values.txt', "a") as the_file:
for line in f:
lineArr = line[:-1].split(",")
if float(lineArr[1])>float(lineArr[3]):
lineArr[1], lineArr[3] = lineArr[3], lineArr[1]
if float(lineArr[2])>float(lineArr[4]):
lineArr[2], lineArr[4] = lineArr[4], lineArr[2]
the_file.write(",".join(lineArr) + "\n")
lineArr = line[:-1].split(",")
So that you don't get the new line character with the last element of the list else when you do this it will input a new line character when and if the last number is swapped. Try this on the inputs I provided to understand it's importance lineArr = line.split(",")
Using split
you get a list
in which slicing
can be used to get the data which is converted into float
and value is compared and if they are not what you wanted they will be swapped
.
data/faulty.txt :
data/processed/10/blueprint-0.png,1915.0,387.0,1933.0,402.0
data/processed/10/blueprint-0.png,1234.5,387.0,1222.1,380.0
data/processed/10/blueprint-0.png,3350.0,387.0,3353.0,388.0
After running the python script.
data/true_values.txt :
data/processed/10/blueprint-0.png,1915.0,387.0,1933.0,402.0
data/processed/10/blueprint-0.png,1222.1,380.0,1234.5,387.0 #Swapped
data/processed/10/blueprint-0.png,3350.0,387.0,3353.0,388.0
This should work (stripped away the file access to be able to replicate the problem):
input = ['data/processed/10/blueprint-0.png,1915.0,387.0,1933.0,402.0',
'data/processed/10/blueprint-0.png,3353.0,387.0,3350.0,388.0']
output = []
for input_line in input:
numbers = input_line.split(',')
if numbers:
if float(numbers[1]) > float(numbers[3]):
numbers[1], numbers[3] = numbers[3], numbers[1]
if float(numbers[2]) > float(numbers[4]):
numbers[2], numbers[4] = numbers[4], numbers[2]
output.append(','.join(numbers))
print(output)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.