I have a file like that (chat file, about 5GB)
XX, XX[14:59]:
hello YY.
how are you today?
YY, YY [14:59]:
Hi
good, thank you, and you~
XX, XX[15:00]:
thanks, I am good)
YY, YY [15:00]:
;)
I wrote some regexes so I can see the name changing.
def is_service_desk_line(line):
return re.match("XX, XX \[[0-9]+:[0-9]+[ A-Z]*\]:", line)
def is_someone_else_line(line):
return re.match("YY, YY *\[[0-9]+:[0-9]+[ A-Z]*\]:", line)
I don't know how I need to read the file and how to set the "trigger" (eg once the line is matching service desk line, start to write next lines which belong to user XX until the line is matches someone's else line.
I only need the message lines that belong to XX user.
I know how to read files in Python, but how can I set the "trigger"?
UPDATE:
Output I need is just a lines which belongs to XX user:
hello YY.
how are you toda?
thanks, I am good)
If I understand the question correctly you can use the following:
f = open("input_file")
lines = f.readlines()
f.close()
for line in lines:
if is_service_desk_line(line):
print("This is a service desk line", line)
elif is_someone_else_line(line):
print("This is someone else", line)
else:
print("This is neither", line)
If you only want the lines from user XX then just use the if
statement (no if else etc.) and it will only output the lines from that user.
To output only the lines from user XX you can use the following:
f = open("input.txt")
lines = f.readlines()
f.close()
print_line = False
for line in lines:
if is_someone_else_line(line):
print_line = False
if print_line:
print(line)
if is_service_desk_line(line):
print_line = True
Output is:
hello YY.
how are you today?
thanks, I am good)
I did it!
The code:
def get_messages(w_file, r_file):
trigger = ""
someone_else_line = ""
with open(w_file, "wb") as w:
with open(r_file, "r") as r:
for line in r:
if is_service_desk_line(line):
trigger = line
continue
elif is_someone_else_line(line) or is_different_support(line):
trigger = line
someone_else_line = trigger
continue
elif line.startswith("From"):
trigger = someone_else_line
elif is_service_desk_line(trigger):
w.write(line)
Closed :)
You have quite a large file, I would not recommend reading the whole thing into memory using readlines()
A memory efficient version of this would be along the lines of:
out_flag = False
with open('input_file.txt', 'r') as in_file:
for line in in_file:
if is_service_desk_line(line):
out_flag = True # sets output flag
continue # skips that line
if is_someone_else_line(line):
out_flag = False # sets output flag
if out_flag:
print(line)
After you match the current line, you need to print all values until another users line. So you need to add a regex for any user to know when to stop printing lines and check for some user again.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.