简体   繁体   中英

String.split ignore content inside square brackets with regex

I have a chatlog that is as follows:

12-09-18 00:31:40   @966 [playerwithoutspaces] to TEAM: Hello all
12-09-18 00:32:11   @966 [playerswith[inname] to ALL:   Helloall
12-09-18 00:30:15   @966 [player name with spaces] to ALL:  Hello all]

I'm trying to get date, time, id,name, to, chat and content with re.split("""[\\s\\t](?![^[]*\\])""", line, 6) But it doesn't quite work. The problem is, if the content contains [ or ], it doesn't split the line properly.

So the result is:

['12-09-18', '00:30:15', '@966', '[player name with spaces] to ALL:\\tHello all]', '']

When it should be:

['12-09-18', '00:30:15', '@966', '[player name with spaces]', 'to', 'ALL:', '\\tHello all]']

I tried fiddling around with matching ] just certain amount of times, but that didn't work.

I forgot to mention that content is either preceded by a tab \\t or whitespace \\s, so it varies.

Here is the code as requested:

file = open("chatlog.txt", encoding="ANSI")
...
async def main():
    for line in file.readlines():
        await handle_chatlog_line(line)

async def handle_chatlog_line(line):
    print(re.split("""[\s\t](?![^[]*\])""", line, 6))
    date, time, ingame_client_id, client_name, irrelevant, chat, content = re.split("""[\s\t](?![^[]*\])""", line, 6)

And it crashes on the 3rd line in chatlog due to the regex being incorrect and therefore split not producing enough items.

我意识到在这种情况下拆分并不合适,所以我最终使用了 re.match:

match = re.match("(\d\d-\d\d-\d\d \d\d:\d\d:\d\d)\s+(@\d+) \[(.+)\] to (TEAM|ALL):\s+(.+)",line)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM