简体   繁体   中英

Python - Read 3 line in a textfile at the same time

i have a file.txt that look like this.

testings 1
response 1-a
time 32s

testings 2
response 2-a
time 32s

testings 3
*blank*

testings 4
error

testings 5
response 5-a
time 26s

and prints

['testings 1', 'testings 2', 'testings 3', 'testings 4', 'testings 5']     
['response 1-a', 'response 2-a', 'response 5-a']
['time 32s', 'time 20s', 'time 26s']

So it´sa simpel code i have, it opens the file, uses readlines() and looks for the keywords testings , response and time then appends the string to 3 seperat lists. As shown in the file.txt some testings x are either *blank* or has an error instead off a response . My problem is that i need the lists to always have the same lenght. Like this:

 ['testings 1', 'testings 2', 'testings 3', 'testings 4', 'testings 5']
 ['response 1-a', 'response 2-a', '*error*', '*error*', 'response 5-a']
 ['time 32s', 'time 20s', '*error*', '*error*',  'time 26s']

So i was thinking if it´s posbile to "read for 3 lines at the same time" and have a if-statment where all the 3 lines need to have the right keywords ("be True") or else insert *error* in the response and time list to keep the lenght right. Or is there even a better way to keep 3 list at the same lenght?

test = []
response = []
time =[]

with open("textfile.txt",'r') as txt_file:
    for line in txt_file.readlines():
        if ("testings") in line:
            test.append(line.strip())    
        if ("response") in line:
            response.append(line.strip())
        if ("time") in line:
            time.append(line.strip())


print (response)
print (test)
print (time)

Text file are iterables , meaning you can loop over them directly, or you can use the next() function to get another line from them. The file object will always produce the next line in the file whatever method you are using, even when mixing techniques.

You can use this to pull in more lines in a for loop:

with open("textfile.txt",'r') as txt_file:
    for line in txt_file:
        line = line.strip()
        if line.startswith('testings'):
            # expect two more lines, response and time
            response_line = next(txt_file, '')
            if not response_line.startswith('response'):
                # not a valid block, scan forward to the next testings
                continue
            time_line = next(txt_file, '')
            if not time_line.startswith('time'):
                # not a valid block, scan forward to the next testings
                continue
            # valid block, we got our three elements
            test.append(line) 
            response.append(response_line.strip())
            time.append(time_line.strip())

So when a line starting with testings is found, the code pulls in the next line. If that line starts with response , another line is pulled in. If that line starts with time , then all three lines are appended to your data structures. If neither of those two conditions are met, the the outer for loop is continued and reading the file continues until another testings line is found.

The added bonus is that the file is never read into memory in one go. File buffering keeps this efficient, but otherwise you never need more memory than is needed for the final set of lists (valid data), and the three lines currently being tested.

Side note: I'd strongly recommend you do not use three separate lists of equal length. You could just use a single list with tuples:

test_data = []
# ... in the loop ...
test_data.append((line, response_line.strip(), time_line.strip()))

and then use that single list to keep each triplet of information together. You can even use a named tuple :

from collections import namedtuple

TestEntry = namedtuple('TestEntry', 'test response time')

# ... in the loop
test_data.append(TestEntry(line, response_line.strip(), time_line.strip()))

at which point each entry in the test_data list is an object with test , response and time attributes:

for entry in test_data:
    print(entry.test, entry.response, entry.time)

This snippet does what you are seeking. You can use next(txt_file, '') to retrieve the next line without having to load the file into memory first. Then, you look only for lines that contain "testing", and when you do, you compare the next two lines. it will always add one string to each list, any time it finds "testing", however, if it doesn't find "response" or "time" then it will insert errors where appropriate. Here is the code, using the input you gave above.

with open("textfile.txt", "r") as txt_file:
     test = []
     response = []
     time = []
     for line in txt_file:
         if "testings" in line:
             test_line = line.strip()
             response_line = next(txt_file, '').strip()
             time_line = next(txt_file, '').strip()
             test.append(test_line)
             if "response" in response_line:
                 response.append(response_line)
             else:
                 response.append("*error*")
             if "time" in time_line:
                 time.append(time_line)
             else:
                 time.append("*error*")

And the Output:

In : test
Out: ['testings 1', 'testings 2', 'testings 3', 'testings 4', 'testings 5']

In : response
Out: ['response 1-a', 'response 2-a', '*error*', '*error*', 'response 5-a']

In : time
Out: ['time 32s', 'time 32s', '*error*', '*error*', 'time 26']

In : len(test), len(response), len(time)
Out: (5, 5, 5)

From the answer here

from itertools import zip_longest

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

with open("textfile.txt",'r') as txt_file:
    for batch in grouper(txt.readlines, 3):
        if ("testings") in batch[0]:
            test.append(line.strip())
        else:
            test.append('error')
        if ("response") in batch[1]:
            response.append(line.strip())
        else:
            response.append('error')
        if ("time") in batch[2]:
            time.append(line.strip())
        else:
            time.append('error')

This assumes there will always be the lines in the same order, and that the file is always organised in batches of three lines, even if that is just a blank line. Since it actually looks like your input file has a blank line between each group of 3 you may need to change grouper to read batches of 4.

A quick working solution that is just reading the file as one string, and then operating on the string to get the desired output.

# -*- coding: utf-8 -*-

file = "test.txt"

with open(file, "r") as f:
    data = f.read()

data = data.split("testings")

# First one is empty because it start with "testings"
del data[0]

data = [elt.split("\n") for elt in data]

# Add the neccessary errors.
for i, elt in enumerate(data):
    if "response" not in elt[1]:
        data[i][1] = '*error*'
        data[i][2] = '*error*'

    # Because of the \n between the response, the elt length is not 3. Let's keep the 3 first ones.
    data[i] = data[i][:3]

print (data)

"""
[[' 1', 'response 1-a', 'time 32s'], 
[' 2', 'response 2-a', 'time 32s'], 
[' 3', '*error*', '*error*'], 
[' 4', '*error*', '*error*'], 
[' 5', 'response 5-a', 'time 26s']]
"""

# Now list comprehension to get your output

ID = ["testings" + elt[0] for elt in data]
Response = [elt[1] for elt in data]
Value = [elt[2] for elt in data]

It is fairly simple, and it takes into account every scenario you proposed.

Output:

print (ID)
['testings 1', 'testings 2', 'testings 3', 'testings 4', 'testings 5']

print (Response)
['response 1-a', 'response 2-a', '*error*', '*error*', 'response 5-a']

print (Value)
['time 32s', 'time 32s', '*error*', '*error*', 'time 26s']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM