[英]Is it possible to read a file line-by-line in while also skipping a given number of lines Python
[英]Python: How to validate input file line-by-line, fix possible errors, and write cleaned lines to another file?
我在文本文件中的行看起來像這樣:
data/processed/10/blueprint-0.png,1915.0,387.0,1933.0,402.0
data/processed/10/blueprint-0.png,3350.0,387.0,3353.0,388.0
1915
位置處的數字應始終小於1933
位置處的元素,並且387
位置處的元素應始終小於402
位置處的元素。
不幸的是,情況並非總是如此,因為我的數據並不完全干凈。 為了解決這個問題,我想創建另一個文件,我只是復制該行,如果它是正確的,並進行必要的調整,如果不是那樣將它固定在新文件中(我不想操縱原始文件中的數據) 。
我的代碼:
path = 'data/faulty.txt'
with open(path ) as f:
with open('data/true_values.txt', 'a') as the_file:
for line in f:
numbers = re.findall(r'\d+', line)
if numbers:
if numbers[2] > numbers[6]:
temp = numbers[2]
numbers[2] = numbers[6]
numbers[6] = temp
if numbers[4] > numbers[8]:
temp = numbers[2]
numbers[2] = numbers[6]
numbers[6] = temp
the_file.write(line)
如何進行更改? 我還考慮過使用re.sub
但無法使其工作。
沒有re
例子:
input_filename = 'full_path_to_my_input_file.txt'
output_filename = 'full_path_to_my_output_file.txt'
with open(output_filename, 'a') as f_out:
with open(input_filename, 'r') as f_in:
for line in f_in:
records = line.strip().split(',')
if float(records[1]) > float(records[3]):
records[1], records[3] = records[3], records[1]
if float(records[2]) > float(records[4]):
records[2], records[4] = records[4], records[2]
f_out.write(','.join(records) + '\n')
輸入:
data/processed/10/blueprint-0.png,1915.0,387.0,1933.0,402.0
data/processed/10/blueprint-0.png,3353.0,389.0,3350.0,388.0
data/processed/10/blueprint-0.png,952.0,724.0,1010.0,734.0
輸出:
data/processed/10/blueprint-0.png,1915.0,387.0,1933.0,402.0
data/processed/10/blueprint-0.png,3350.0,388.0,3353.0,389.0 ## swapped !!
data/processed/10/blueprint-0.png,952.0,724.0,1010.0,734.0
我會在你去的時候把修改后的行寫到列表中,然后將列表寫到最后的文件中。 這樣你在處理第一個文件時就不會打開這兩個文件,這使得它更具原子性。 還修復了我之前錯過的正則表達式。
import re
input = "data/faulty.txt"
output = "data/true_values.txt"
new = []
with open(input) as f:
for line in f:
name, numberstr = line.split(',', 1)
numbers = re.findall(r'\d+\.\d+|\d+', numberstr)
if numbers:
if numbers[0] > numbers[2]:
numbers[0], numbers[2] = numbers[2], numbers[0]
if numbers[1] > numbers[3]:
numbers[1], numbers[3] = numbers[3], numbers[1]
new.append("{},{}".format(name, ','.join(numbers)))
with open(output, 'a') as the_file:
for x in new:
the_file.write(x + '\n')
我相信它可能不使用re
試試這個
with open(path) as f, open('output.txt', 'w') as outputFile:
for line in f:
lineArr = line.split(",")
if float(lineArr[1])>float(lineArr[3]):
lineArr[1], lineArr[3] = lineArr[3].replace("\n",""), lineArr[1].replace("\n","")
if float(lineArr[2])>float(lineArr[4]):
lineArr[2], lineArr[4] = lineArr[4].replace("\n",""), lineArr[2].replace("\n","")
lineArr.append("\n")
outputFile.write(",".join(lineArr))
當它們不是正確的工具時,請不要使用正則表達式。 你顯然有一個csv格式,所以使用csv模塊。 此外,您需要將“數字”轉換為實際數字 - 您所讀到的是字符串而不是數字。 最后,一旦解析並修復了一行,就必須在寫回之前從“固定”值重新創建新行:
# XXX untested code, may contains typos or small bugs
import csv
inpath = 'data/faulty.txt'
outpath = 'data/true_values.txt'
with open(inpath) as infile, open(outpath, 'a') as outpath:
# please check the csv doc for the correct options for your file format
reader = csv.reader(infile, delim=",")
writer = csv.writer(outfile, delim=",")
for row in reader:
# split the path from the numbers
imagepath, nums = row[0], row[1:]
# convert numbers to floats so we have
# meaningful comparisons
nums = [float(num) for num in nums]
# swap the numbers if necessary
if nums[0] > nums[2]:
nums[2], nums[0] = nums[0], nums[2]
if nums[1] > nums[3]:
nums[3], nums[1] = nums[1], nums[3]
# recreate the fixed row and write it
newrow = [imagepath] + nums
writer.writerow(newrow)
path = 'data/faulty.txt'
with open(path, "r") as f, open('data/true_values.txt', "a") as the_file:
for line in f:
lineArr = line[:-1].split(",")
if float(lineArr[1])>float(lineArr[3]):
lineArr[1], lineArr[3] = lineArr[3], lineArr[1]
if float(lineArr[2])>float(lineArr[4]):
lineArr[2], lineArr[4] = lineArr[4], lineArr[2]
the_file.write(",".join(lineArr) + "\n")
lineArr = line[:-1].split(",")
因此,當你這樣做時,你沒有得到列表的最后一個元素的新行字符,它將輸入一個新的行字符,如果是最后一個數字被交換。 嘗試使用我提供的輸入來理解它的重要性lineArr = line.split(",")
使用split
你會得到一個list
,其中可以使用slicing
來獲取轉換為float
的數據,並比較值,如果它們不是你想要的,它們將被swapped
。
data / faulty.txt:
data/processed/10/blueprint-0.png,1915.0,387.0,1933.0,402.0
data/processed/10/blueprint-0.png,1234.5,387.0,1222.1,380.0
data/processed/10/blueprint-0.png,3350.0,387.0,3353.0,388.0
運行python腳本后。
data / true_values.txt:
data/processed/10/blueprint-0.png,1915.0,387.0,1933.0,402.0
data/processed/10/blueprint-0.png,1222.1,380.0,1234.5,387.0 #Swapped
data/processed/10/blueprint-0.png,3350.0,387.0,3353.0,388.0
這應該工作(剝離文件訪問權限以便能夠復制問題):
input = ['data/processed/10/blueprint-0.png,1915.0,387.0,1933.0,402.0',
'data/processed/10/blueprint-0.png,3353.0,387.0,3350.0,388.0']
output = []
for input_line in input:
numbers = input_line.split(',')
if numbers:
if float(numbers[1]) > float(numbers[3]):
numbers[1], numbers[3] = numbers[3], numbers[1]
if float(numbers[2]) > float(numbers[4]):
numbers[2], numbers[4] = numbers[4], numbers[2]
output.append(','.join(numbers))
print(output)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.