如何使用字符串作為 python 的標識符來拆分文件？

Question

我有一個巨大的文本文件，需要將其拆分為某個文件。 在文本文件中有一個標識符來拆分文件。 這是文本文件的一部分，如下所示：

Comp MOFVersion 10.1
Copyright 1997-2006. All rights reserved.
-------------------------------------------------- 
Mon 11/19/2022 8:34:22.35 - Starting The Process... 
-------------------------------------------------- 

There are a lot of content here
...

exit 
--------------------- 
list volume 
list partition 
exit
--------------------- 

Volume 0 is the selected volume.

Disk ###  Status         Size     Free     Dyn  Gpt
--------  -------------  -------  -------  ---  ---
* Disk 0    Online          238 GB   136 GB        *

-------------------------------------------------- 
Tue 11/20/2022 8:34:22.35 - Starting The Process... 
-------------------------------------------------- 

There are a lot of content here
....
SERVICE_NAME: vds 
    TYPE               : 10  WIN32_OWN_PROCESS  
    STATE              : 1  STOPPED 
    WIN32_EXIT_CODE    : 0  (0x0)
    SERVICE_EXIT_CODE  : 0  (0x0)
    CHECKPOINT         : 0x0
    WAIT_HINT          : 0x0
--------------------- 
*exit /b 0 
File not found - *.*
0 File(s) copied

-------------------------------------------------- 
Wed 11/21/2022 8:34:22.35 - Starting The Process... 
-------------------------------------------------- 

There are a lot of content here

==========================================
Computer: .
==========================================
Active: True
DmiRevision: 0
list disk
exit
--------------------- 
*exit /b 0 

11/19/2021  08:34 AM    <DIR>          .
11/19/2021  08:34 AM    <DIR>          ..
11/19/2021  08:34 AM                 0 SL
               1 File(s)              0 bytes
               2 Dir(s)  80,160,923,648 bytes free

我的期望是通過映射字符串“Starting The Process”來拆分文件。 因此，如果我有一個像上面示例一樣的文本文件，那么該文件將拆分為 3 個文件，每個文件都有不同的內容。 例如：

file1
-------------------------------------------------- 
Mon 11/19/2022 8:34:22.35 - Starting The Process... 
-------------------------------------------------- 

There are a lot of content here
...

exit 
--------------------- 
list volume 
list partition 
exit
--------------------- 

Volume 0 is the selected volume.

Disk ###  Status         Size     Free     Dyn  Gpt
--------  -------------  -------  -------  ---  ---
* Disk 0    Online          238 GB   136 GB        *


file2
-------------------------------------------------- 
Tue 11/20/2022 8:34:22.35 - Starting The Process... 
-------------------------------------------------- 

There are a lot of content here
....
SERVICE_NAME: vds 
    TYPE               : 10  WIN32_OWN_PROCESS  
    STATE              : 1  STOPPED 
    WIN32_EXIT_CODE    : 0  (0x0)
    SERVICE_EXIT_CODE  : 0  (0x0)
    CHECKPOINT         : 0x0
    WAIT_HINT          : 0x0
--------------------- 
*exit /b 0 
File not found - *.*
0 File(s) copied

file 3
-------------------------------------------------- 
Wed 11/21/2022 8:34:22.35 - Starting The Process... 
-------------------------------------------------- 

There are a lot of content here

==========================================
Computer: .
==========================================
Active: True
DmiRevision: 0
list disk
exit
--------------------- 
*exit /b 0 

11/19/2021  08:34 AM    <DIR>          .
11/19/2021  08:34 AM    <DIR>          ..
11/19/2021  08:34 AM                 0 SL
               1 File(s)              0 bytes
               2 Dir(s)  80,160,923,648 bytes free

這是我試過的：

logfile = "E:/DATA/result.txt"
with open(logfile, 'r') as text_file:
    lines = text_file.readlines()
    for line in lines:
        if "Starting The Process..." in line:
            print(line)

我只能找到帶有字符串的行，但我不知道如何在拆分為 3 個部分和 output 到新文件后獲取每一行的內容。

Python可以嗎？ 謝謝你的任何建議。

Answer 1

好吧，如果文件足夠小以輕松放入 memory（比如 1GB 或更少），您可以將整個文件讀入一個字符串，然后使用re.findall ：

with open('data.txt', 'r') as file:
    data = file.read()
    parts = re.findall(r'-{10,}[^-]*\n\w{3} \d{2}\/\d{2}\/\d{4}.*?-{10,}.*?(?=-{10,}|$)', data, flags=re.S)

cnt = 1
for part in parts:
    output = open('file ' + str(cnt), 'w')
    output.write(part)
    output.close()
    cnt = cnt + 1

Answer 2

如果文件中的破折號長度固定，則另一種解決方案可能是：

with open('file.txt', 'r') as f: 
split_text = f.read().split('--------------------------------------------------')
split_text.pop(0) # To remove the Copyright message at the start

for i in range(0, len(split_text) - 1, 2): 
    with open(f'file{int(i/2)}.txt', 'w') as temp: 
        temp_txt = ''.join(split_text[i:i+2])
        temp.write(temp_txt)

本質上，我只是在這些破折號的基礎上拆分並連接每個連續的元素。 通過這種方式，您可以將有關時間戳的信息保存在每個文件的內容中。

如何使用字符串作為 python 的標識符來拆分文件？

問題描述

2 個解決方案

解決方案1
1 2022-11-22 09:38:12

解決方案2
0 2022-11-22 09:54:42

如何使用字符串作為 python 的標識符來拆分文件？

問題描述

2 個解決方案

解決方案1 1 2022-11-22 09:38:12

解決方案2 0 2022-11-22 09:54:42

解決方案1
1 2022-11-22 09:38:12

解決方案2
0 2022-11-22 09:54:42