This is a portion of a textfile I have
Participant: Interviewer
Translation: <english>Mhlongo.</english> Okay Monde, what languages do you typically use with your family and why?
:
Participant: Participant
Translation: Okay <english>it was Zulu, eh and Sotho, eh:</english> my mom is Sotho and my father is Zulu so we her language most of the time.
:
Participant: Interviewer
Translation: Mh, and so <english>you speak</english> <english>you speak</english>. What languages or language do you use with friends and why?
:
Participant: Participant
Translation: Eh, isiZulu.
I am trying to iterate through to get participant and interviewer translations. This is the code I have for it.
while True:
interviewer = f.readline()
interviewer_translation = f.readline()
participant = f.readline()
participant_translation = f.readline()
...
if not participant_translation: break
However, the above code tries to get it line by line but that doesn't work since the translation sometimes takes a couple of lines or more. Is there a way I can do it without having to use readline?
You can read line by line using f.readline()
and concatenate up to a record delimiter, then process the concatenated chunk, eg:
def process(participant, translation):
pass
participant = None
translation = ''
for line in f:
if line.startswith('Participant: '):
if participant:
process(participant, translation)
participant = line
translation = ''
elif participant and line.startswith('Translation: '):
translation += line
process(participant, translation)
Or you can use f.read(size)
function to read a bigger chunk of the file or whole file, if size
argument is ommited:
>>> f.read()
'This is the entire file.\n'
Then you can use multilne regex to get you meaningful chunks of text from it, eg entire records:
>>> re.findall('(?P<record>^Participant:.*?)(?=(?:Participant:|\Z))', text, re.S | re.M)
['Participant: Interviewer\nTranslation: <english>Mhlongo.</english> Okay Monde, what languages do you typically use with your family and why?\n :\n', 'Participant: Participant\nTranslation: Okay <english>it was Zulu, eh and Sotho, eh:</english> my mom is Sotho and my father is Zulu so we her language most of the time.\n :\n', 'Participant: Interviewer\nTranslation: Mh, and so <english>you speak</english> <english>you speak</english>. What languages or language do you use with friends and why?\n :\n', 'Participant: Participant\nTranslation: Eh, isiZulu.\n']
Whatever feels more comfortable for you. Be careful with reading large files at once though as they may not fit into available memory.
If the participant- and the interviewer-line always only take one line and always look the same, then you could use something like that:
p_translation = ""
i_translation = ""
interviewer = False
for line in f:
if line.startsWith("Participant: Participant"):
#This would be the place to process i_translation
#because now the translation of the interviewer was
#fully read
interviewer = False
p_translation = ""
elif line.startsWith("Participant: Interviewer"):
#This would be the place to process p_translation
#because now the translation of the participant was
#fully read
interviewer = True
i_translation = ""
else:
if interviewer:
i_translation += line
else:
p_translation += line
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.