I'm trying to extract lines from a very large text file (10Gb). The text file contains the output from an engineering software (it's not a CSV file). I want to copy from line 1 to the first line containing the string 'stop' and then resume from the first line containing 'restart' to the end of the file.
The following code works but it's rather slow (about a minute). Is there a better way to do it using pandas? I have tried the read_csv function but I don't have a delimiter to input.
file_to_copy = r"C:\Users\joedoe\Desktop\C ANSYS R1\PATCHED\modes.txt"
output = r"C:\Users\joedoe\Desktop\C ANSYS R1\PATCHED\modes_extract.txt"
stop = '***** EIGENVECTOR (MODE SHAPE) SOLUTION *****'
restart = '***** PARTICIPATION FACTOR CALCULATION ***** X DIRECTION'
with open(file_to_copy) as f:
orig = f.readlines()
newf = open(output, "w")
write = True
first_time = True
for line in orig:
if first_time == True:
if stop in line:
first_time = False
write = False
for i in range(300):
newf.write(
'\n -------------------- MIDDLE OF THE FILE -------------------')
newf.write('\n\n')
if restart in line: write = True
if write: newf.write(line)
newf.close()
print('Done.')
readlines
iterates over the whole file. Then you iterate over the result of readlines
. I think the following edit will save you one whole iteration through the big file.
write = True
first_time = True
with open(file_to_copy) as f, open(output, "w") as newf:
for line in f:
if first_time == True:
if stop in line:
first_time = False
write = False
for i in range(300):
newf.write(
'\n -------------------- MIDDLE OF THE FILE -------------------')
print('\n\n')
if restart in line: write = True
if write: newf.write(line)
print('Done.')
You should use python generators. Also printing makes the process slower.
Following are few examples to use generators:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.