简体   繁体   中英

Python. How to get needed lines from file

I have a file like that (chat file, about 5GB)

XX, XX[14:59]: 
hello YY.
how are you today? 
YY, YY [14:59]: 
Hi
good, thank you, and you~
XX, XX[15:00]: 
thanks, I am good) 
YY, YY [15:00]: 
;)

I wrote some regexes so I can see the name changing.

def is_service_desk_line(line):
    return re.match("XX, XX \[[0-9]+:[0-9]+[ A-Z]*\]:", line)


def is_someone_else_line(line):
    return re.match("YY, YY *\[[0-9]+:[0-9]+[ A-Z]*\]:", line)

I don't know how I need to read the file and how to set the "trigger" (eg once the line is matching service desk line, start to write next lines which belong to user XX until the line is matches someone's else line.

I only need the message lines that belong to XX user.

I know how to read files in Python, but how can I set the "trigger"?

UPDATE:

Output I need is just a lines which belongs to XX user:

hello YY.
how are you toda?
thanks, I am good)

If I understand the question correctly you can use the following:

f = open("input_file")
lines = f.readlines()
f.close()

for line in lines:
    if is_service_desk_line(line):
        print("This is a service desk line", line)
    elif is_someone_else_line(line):
        print("This is someone else", line)
    else:
        print("This is neither", line)

If you only want the lines from user XX then just use the if statement (no if else etc.) and it will only output the lines from that user.

EDIT

To output only the lines from user XX you can use the following:

f = open("input.txt")
lines = f.readlines()
f.close()

print_line = False

for line in lines:
    if is_someone_else_line(line):
        print_line = False

    if print_line:
        print(line)

    if is_service_desk_line(line):
        print_line = True

Output is:

hello YY.

how are you today? 

thanks, I am good) 

I did it!

The code:

def get_messages(w_file, r_file):
    trigger = ""
    someone_else_line = ""
    with open(w_file, "wb") as w:
        with open(r_file, "r") as r:
            for line in r:
                if is_service_desk_line(line):
                    trigger = line
                    continue
                elif is_someone_else_line(line) or is_different_support(line):
                    trigger = line
                    someone_else_line = trigger
                    continue
                elif line.startswith("From"):
                    trigger = someone_else_line

                elif is_service_desk_line(trigger):
                    w.write(line)

Closed :)

You have quite a large file, I would not recommend reading the whole thing into memory using readlines()

A memory efficient version of this would be along the lines of:

out_flag = False

with open('input_file.txt', 'r') as in_file:
    for line in in_file:
        if is_service_desk_line(line):
            out_flag = True  # sets output flag
            continue  # skips that line
        if is_someone_else_line(line):
            out_flag = False  # sets output flag
        if out_flag:
            print(line)

After you match the current line, you need to print all values until another users line. So you need to add a regex for any user to know when to stop printing lines and check for some user again.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM