簡體   English   中英

在文本文件中兩個子字符串的每次出現之間提取文本

[英]Extract text between every occurrence of two substrings in a text file

以下是來自日志文件的一些示例文本。 我需要提取每次發生的“上載事件”和發生的下一個“}”之間的所有文本。 我還添加了我需要返回的示例(請注意,這只是一個示例-我將把該方法應用於更一般的情況)。 另外,我輸出的格式不是很好,只是一個想法。 接近就可以了,我可以從那里進行格式化,內容是最重要的:

輸入:

2019-06-28 15:02:09:918 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Activate assertion: <BKProcessAssertion: 0x11ff1e710; "Shared Background Assertion 737 for el.defg.na.abcrrre2" (finishTask:180s); id:\M-b\M^@\M-&988DD2F10162>
2019-06-28 15:02:09:920 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] New process assertion state; preventSuspend, preventThrottleDownUI, preventThrottleDownCPU, preventIdleSleep, preventSuspendOnSleep (assertion 0x11ff1e710 added: preventIdleSleep; removed: (none))
2019-06-28 15:02:09:921 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Setting jetsam priority to 10 [0x10108]
2019-06-28 15:02:09:921 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: Creating PowerAssertion on abc-rrre:365
2019-06-28 15:02:09:922 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target powerd[35] <Notice>: Sleep revert state: 1
2019-06-28 15:02:09:926 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target powerd[35] <Notice>: Process assertiond.62 Created SystemIsActive "abc-rrre:365:365-6E62D75B-8078-47DE-9B22-988DD2F10162 [Shared Background Assertion 737 for el.defg.na.abcrrre2] [0x11ff1e710]" age:00:00:00  id:51539643064 [System: SysAct]
2019-06-28 15:02:09:926 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: Created PowerAssertion on abc-rrre:365, sleep reverted
2019-06-28 15:02:09:926 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: Client relinquished <BKProcessAssertion: 0x11ff1e710; "Shared Background Assertion 737 for el.defg.na.abcrrre2" (finishTask:180s); id:\M-b\M^@\M-&988DD2F10162>
2019-06-28 15:02:09:927 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Deactivate assertion: <BKProcessAssertion: 0x11ff1e710; "Shared Background Assertion 737 for el.defg.na.abcrrre2" (finishTask:180s); id:\M-b\M^@\M-&988DD2F10162>
2019-06-28 15:02:09:928 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] New process assertion state; preventSuspend, preventThrottleDownUI, preventThrottleDownCPU, preventSuspendOnSleep (assertion 0x11ff1e710 added: (none); removed: preventIdleSleep)
2019-06-28 15:02:09:929 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Setting jetsam priority to 10 [0x10100]
2019-06-28 15:02:09:929 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: Releasing PowerAssertion on abc-rrre:365 from update
2019-06-28 15:02:09:930 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Remove assertion: <BKProcessAssertion: 0x11ff1e710; "Shared Background Assertion 737 for el.defg.na.abcrrre2" (finishTask:180s); id:\M-b\M^@\M-&988DD2F10162>
2019-06-28 15:02:09:931 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target powerd[35] <Notice>: Process assertiond.62 Released SystemIsActive "abc-rrre:365:365-6E62D75B-8078-47DE-9B22-988DD2F10162 [Shared Background Assertion 737 for el.defg.na.abcrrre2] [0x11ff1e710]" age:00:00:00  id:51539643064 [System: SysAct]
2019-06-28 15:02:09:932 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: -[BKAssertion dealloc] - <0x11ff1e710>
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device—Target ABC-DEF[365] Notice: -[sendAllDataToServerWithDebug:] [Line 255] Event uploaded, ABCAccount : {
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW]     dcis = 0;
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW]     ttl = 0;
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW]     bb = 0;
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW]     r1 = 1234567890;
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW]     pop = abc;
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW]     origin = target;
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW]     "tsn" = “l323f123f”;
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW] }
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device—Target ABC-DEF[365] Notice: -[sendAllDataToServerWithDebug:] [Line 255] Event uploaded, ABCAccount : {
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     dcis = 0;
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     ttl = 0;
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     bb = 0;
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     r1 = NA;
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     pop = abc;
2019-06-28 15:02:09:936 - info: [bUSLog] [bUS_SYSLOG_ROW]     origin = source;
2019-06-28 15:02:09:936 - info: [bUSLog] [bUS_SYSLOG_ROW]     "tsn" = “lasdf23f23”;
2019-06-28 15:02:09:936 - info: [bUSLog] [bUS_SYSLOG_ROW] }
2019-06-28 15:02:09:936 - info: [bUSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device—Target ABC-DEF[365] Notice: -[sendAllDataToServerWithDebug:] [Line 255] Event uploaded, ABCAdditional : {
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     add1 = value;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     add2 = false;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     pop = abc;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     origin = target;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     “tsn” = “g254g34gg4g”;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     "time_zone" = EDT;
2019-06-28 15:02:09:938 - info: [bUSLog] [IOS_SYSLOG_ROW] }

輸出:

ABCAccount : { dcis = 0; ttl = 0; bb = 0; r1 = 1234567890; pop = abc; origin = target; "tsn" = “l323f123f”;}
ABCAccount : { dcis = 0; ttl = 0; bb = 0; r1 = NA; pop = abc; origin = source; "tsn" = “lasdf23f23”;}
ABCAdditional : { add1 = value; add2 = false; pop = abc; origin = target’;  “tsn” = “g254g34gg4g”; "time_zone" = EDT;}"

我試過使用:

   start = ‘Event uploaded, ’
   end = ‘}’
   new = entry[entry.find(start)+len(start):entry.rfind(end)]

和其他幾種方法(包括regex),但運氣不佳...任何幫助將不勝感激,謝謝!

編輯(嘗試):

with open(target_logs) as log:
do_print = False
event_key = 'Event uploaded,'

for line in log:
    line = line.strip()
    if do_print:
        sys.stdout.write(line[line.rfind(']') + 1:].strip())
    if event_key in line:
        do_print = True
        sys.stdout.write(line[line.find(event_key) + len(event_key):].strip())
    elif line.endswith('}'):
        do_print = False
        print()

答案接聽:

2019-06-28 15:02:11:672 - info: [bUSLog] [BUS_SYSLOG_ROW] Jun 28 11:02:11 device--Target sharingd(WirelessProximity)[57] <Notice>: Nearby start scanning with data: scan request of type 16, blob: <>, mask <>, active: 0, duplicates: 0, screen on: 300, screen off: 300, rssi: -60, peers: (
2019-06-28 15:02:11:672 - info: [bUSLog] [BUS_SYSLOG_ROW]     "1A02F1A8-5597-4B1F-8802-BA022F789F81",
2019-06-28 15:02:11:673 - info: [bUSLog] [BUS_SYSLOG_ROW]     "A80A3D54-F8F2-D96B-598B-3EF0AE3ABC70",
2019-06-28 15:02:11:673 - info: [bUSLog] [BUS_SYSLOG_ROW]     "B4F0AC04-4A06-92EB-AA85-32002E6675BC",
2019-06-28 15:02:11:674 - info: [bUSLog] [BUS_SYSLOG_ROW]     "D9A5686A-C971-ADEB-A33F-2C772F351D45",
2019-06-28 15:02:11:674 - info: [bUSLog] [BUS_SYSLOG_ROW]     "5B66FA21-AA48-66D8-A619-1C0EA9190597",
2019-06-28 15:02:11:674 - info: [bUSLog] [BUS_SYSLOG_ROW]     "C540AC68-57DF-DA13-3C73-1129E2DD5A6D",
2019-06-28 15:02:11:674 - info: [bUSLog] [BUS_SYSLOG_ROW]     "CCD3C7C8-5069-C9C7-D4B5-FCAD9C2FA15F",
2019-06-28 15:02:11:675 - info: [bUSLog] [BUS_SYSLOG_ROW]     "E6A1699E-91BC-AEB1-DE99-C7C0FB440FAA",
2019-06-28 15:02:11:675 - info: [bUSLog] [BUS_SYSLOG_ROW]     "01480FF0-CD8D-C505-524D-CC139711A730"

第一步,我們進行替換( regex101 ),然后在}\\n之后拆分並刪除換行符:

data = '''2019-06-28 15:02:09:918 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Activate assertion: <BKProcessAssertion: 0x11ff1e710; "Shared Background Assertion 737 for el.defg.na.abcrrre2" (finishTask:180s); id:\M-b\M^@\M-&988DD2F10162>
2019-06-28 15:02:09:920 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] New process assertion state; preventSuspend, preventThrottleDownUI, preventThrottleDownCPU, preventIdleSleep, preventSuspendOnSleep (assertion 0x11ff1e710 added: preventIdleSleep; removed: (none))
2019-06-28 15:02:09:921 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Setting jetsam priority to 10 [0x10108]
2019-06-28 15:02:09:921 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: Creating PowerAssertion on abc-rrre:365
2019-06-28 15:02:09:922 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target powerd[35] <Notice>: Sleep revert state: 1
2019-06-28 15:02:09:926 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target powerd[35] <Notice>: Process assertiond.62 Created SystemIsActive "abc-rrre:365:365-6E62D75B-8078-47DE-9B22-988DD2F10162 [Shared Background Assertion 737 for el.defg.na.abcrrre2] [0x11ff1e710]" age:00:00:00  id:51539643064 [System: SysAct]
2019-06-28 15:02:09:926 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: Created PowerAssertion on abc-rrre:365, sleep reverted
2019-06-28 15:02:09:926 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: Client relinquished <BKProcessAssertion: 0x11ff1e710; "Shared Background Assertion 737 for el.defg.na.abcrrre2" (finishTask:180s); id:\M-b\M^@\M-&988DD2F10162>
2019-06-28 15:02:09:927 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Deactivate assertion: <BKProcessAssertion: 0x11ff1e710; "Shared Background Assertion 737 for el.defg.na.abcrrre2" (finishTask:180s); id:\M-b\M^@\M-&988DD2F10162>
2019-06-28 15:02:09:928 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] New process assertion state; preventSuspend, preventThrottleDownUI, preventThrottleDownCPU, preventSuspendOnSleep (assertion 0x11ff1e710 added: (none); removed: preventIdleSleep)
2019-06-28 15:02:09:929 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Setting jetsam priority to 10 [0x10100]
2019-06-28 15:02:09:929 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: Releasing PowerAssertion on abc-rrre:365 from update
2019-06-28 15:02:09:930 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: [abc-rrre:365] Remove assertion: <BKProcessAssertion: 0x11ff1e710; "Shared Background Assertion 737 for el.defg.na.abcrrre2" (finishTask:180s); id:\M-b\M^@\M-&988DD2F10162>
2019-06-28 15:02:09:931 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target powerd[35] <Notice>: Process assertiond.62 Released SystemIsActive "abc-rrre:365:365-6E62D75B-8078-47DE-9B22-988DD2F10162 [Shared Background Assertion 737 for el.defg.na.abcrrre2] [0x11ff1e710]" age:00:00:00  id:51539643064 [System: SysAct]
2019-06-28 15:02:09:932 - info: [iOSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device--Target assertiond[62] <Notice>: -[BKAssertion dealloc] - <0x11ff1e710>
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device—Target ABC-DEF[365] Notice: -[sendAllDataToServerWithDebug:] [Line 255] Event uploaded, ABCAccount : {
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW]     dcis = 0;
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW]     ttl = 0;
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW]     bb = 0;
2019-06-28 15:02:09:933 - info: [bUSLog] [bUS_SYSLOG_ROW]     r1 = 1234567890;
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW]     pop = abc;
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW]     origin = target;
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW]     "tsn" = “l323f123f”;
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW] }
2019-06-28 15:02:09:934 - info: [bUSLog] [bUS_SYSLOG_ROW] Jun 28 11:02:09 device—Target ABC-DEF[365] Notice: -[sendAllDataToServerWithDebug:] [Line 255] Event uploaded, ABCAccount : {
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     dcis = 0;
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     ttl = 0;
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     bb = 0;
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     r1 = NA;
2019-06-28 15:02:09:935 - info: [bUSLog] [bUS_SYSLOG_ROW]     pop = abc;
2019-06-28 15:02:09:936 - info: [bUSLog] [bUS_SYSLOG_ROW]     origin = source;
2019-06-28 15:02:09:936 - info: [bUSLog] [bUS_SYSLOG_ROW]     "tsn" = “lasdf23f23”;
2019-06-28 15:02:09:936 - info: [bUSLog] [bUS_SYSLOG_ROW] }
2019-06-28 15:02:09:936 - info: [bUSLog] [IOS_SYSLOG_ROW] Jun 28 11:02:09 device—Target ABC-DEF[365] Notice: -[sendAllDataToServerWithDebug:] [Line 255] Event uploaded, ABCAdditional : {
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     add1 = value;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     add2 = false;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     pop = abc;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     origin = target;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     “tsn” = “g254g34gg4g”;
2019-06-28 15:02:09:937 - info: [bUSLog] [IOS_SYSLOG_ROW]     "time_zone" = EDT;
2019-06-28 15:02:09:938 - info: [bUSLog] [IOS_SYSLOG_ROW] }'''

import re

data = re.sub(r'^.*SYSLOG_ROW\]\s*(?:[A-Z].+?(?=Event uploaded,|$))?', r'', data, flags=re.M)
data = re.sub(r'^"[^"]+",?$', r'', data, flags=re.M)
for row in [v.replace('\n', '').lstrip('Event uploaded,') for v in re.split(r'(?<=})\n', data)]:
    print(row)

打印:

ABCAccount : {dcis = 0;ttl = 0;bb = 0;r1 = 1234567890;pop = abc;origin = target;"tsn" = “l323f123f”;}
ABCAccount : {dcis = 0;ttl = 0;bb = 0;r1 = NA;pop = abc;origin = source;"tsn" = “lasdf23f23”;}
ABCAdditional : {add1 = value;add2 = false;pop = abc;origin = target;“tsn” = “g254g34gg4g”;"time_zone" = EDT;}

編輯(從文件中讀取):

import re

with open('log.txt', 'r') as f_in:
    data = f_in.read()

data = re.sub(r'^.*SYSLOG_ROW\]\s*(?:[A-Z].+?(?=Event uploaded,|$))?', r'', data, flags=re.M)
data = re.sub(r'^"[^"]+",?$', r'', data, flags=re.M)
for row in [v.replace('\n', '').lstrip('Event uploaded,') for v in re.split(r'(?<=})\n', data)]:
    print(row)

迭代方法(適用於python 3.x ):

with open('log.txt') as log:
    do_print = False
    event_key = 'Event uploaded,'  # starting marker

    for line in log:
        line = line.strip()
        if do_print: print(line[line.rfind(']') + 1:].strip(), end=' ')
        if event_key in line:
            do_print = True
            print(line[line.find(event_key) + len(event_key):].strip(), end=' ')
        elif line.endswith('}'):
            do_print = False
            print()

輸出:

ABCAccount : { dcis = 0; ttl = 0; bb = 0; r1 = 1234567890; pop = abc; origin = target; "tsn" = “l323f123f”; } 
ABCAccount : { dcis = 0; ttl = 0; bb = 0; r1 = NA; pop = abc; origin = source; "tsn" = “lasdf23f23”; } 
ABCAdditional : { add1 = value; add2 = false; pop = abc; origin = target; “tsn” = “g254g34gg4g”; "time_zone" = EDT; } 

對於較低的python版本,可以使用sys.stdout.write方法代替print(..., end=' ')

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM