Reading from textfile without having to use readline()

Question

This is a portion of a textfile I have

Participant: Interviewer 
Translation: <english>Mhlongo.</english> Okay Monde, what languages do you typically use with your family and why? 
        :  
Participant: Participant 
Translation: Okay <english>it was Zulu, eh and Sotho, eh:</english> my mom is Sotho and my father is Zulu so we her language most of the time. 
        :  
Participant: Interviewer 
Translation: Mh, and so <english>you speak</english> <english>you speak</english>. What languages or language do you use with friends and why? 
        :  
Participant: Participant 
Translation:  Eh, isiZulu.

I am trying to iterate through to get participant and interviewer translations. This is the code I have for it.

while True:
    interviewer = f.readline()
    interviewer_translation = f.readline()
    participant = f.readline()
    participant_translation = f.readline()
    ...
    if not participant_translation: break

However, the above code tries to get it line by line but that doesn't work since the translation sometimes takes a couple of lines or more. Is there a way I can do it without having to use readline?

Answer 1

You can read line by line using f.readline() and concatenate up to a record delimiter, then process the concatenated chunk, eg:

def process(participant, translation):
    pass

participant = None
translation = ''
for line in f:
    if line.startswith('Participant: '):
        if participant:
            process(participant, translation)
        participant = line
        translation = ''
    elif participant and line.startswith('Translation: '):
        translation += line
process(participant, translation)

Or you can use f.read(size) function to read a bigger chunk of the file or whole file, if size argument is ommited:

>>> f.read()
'This is the entire file.\n'

Then you can use multilne regex to get you meaningful chunks of text from it, eg entire records:

>>> re.findall('(?P<record>^Participant:.*?)(?=(?:Participant:|\Z))', text, re.S | re.M)
['Participant: Interviewer\nTranslation: <english>Mhlongo.</english> Okay Monde, what languages do you typically use with your family and why?\n        :\n', 'Participant: Participant\nTranslation: Okay <english>it was Zulu, eh and Sotho, eh:</english> my mom is Sotho and my father is Zulu so we her language most of the time.\n        :\n', 'Participant: Interviewer\nTranslation: Mh, and so <english>you speak</english> <english>you speak</english>. What languages or language do you use with friends and why?\n        :\n', 'Participant: Participant\nTranslation:  Eh, isiZulu.\n']

Whatever feels more comfortable for you. Be careful with reading large files at once though as they may not fit into available memory.

Answer 2

If the participant- and the interviewer-line always only take one line and always look the same, then you could use something like that:

p_translation = ""
i_translation = ""
interviewer = False
for line in f:
    if line.startsWith("Participant: Participant"):
        #This would be the place to process i_translation
        #because now the translation of the interviewer was
        #fully read
        interviewer = False
        p_translation = ""
    elif line.startsWith("Participant: Interviewer"):
        #This would be the place to process p_translation
        #because now the translation of the participant was
        #fully read
        interviewer = True
        i_translation = ""
    else:
        if interviewer:
            i_translation += line
        else:
            p_translation += line

Reading from textfile without having to use readline()

Question

2 answers

solution1
1 2014-05-27 08:54:45

solution2
0 ACCPTED 2014-05-27 08:50:44

Reading from textfile without having to use readline()

Question

2 answers

solution1 1 2014-05-27 08:54:45

solution2 0 ACCPTED 2014-05-27 08:50:44

solution1
1 2014-05-27 08:54:45

solution2
0 ACCPTED 2014-05-27 08:50:44