簡體   English   中英

識別兩個連續的相同行並替換第一行

[英]Identifying two consecutive identical lines and replace the first one

我有以下輸入文件結構,每行上都有文本:

line1
line2
line3
line3
line4
line5
line6

當兩行完全相同時,即第3行,我要保留第二行,並將第一行的內容更改為“ SECTION MISSING”。 我沒有把它放在正確的位置。 我最接近的是下面的代碼,但是我得到的輸出是:

line1
line2
line3
SECTION MISSING
line4
etc.

當我想要時:

line1
line2
SECTION MISSING
line3 
line4

碼:

def uniq(iterator):
    previous = float("NaN")  # Not equal to anything
    section=("SECTION : MISSING\n")
    for value in iterator:
        if previous == value:
            yield section
        else:
            yield value
            previous = value
    return;

 with open('infile.txt','r') as file:
    with open('outfile.txt','w') as f:
        for line in uniq(file):
            f.write(line)

我認為您想產生previous value ,而不是value

def uniq(iterator):
    previous = None
    section = ("SECTION : MISSING\n")
    for value in iterator:
        if previous == value:
            yield section
        elif previous is not None:
            yield previous
        previous = value
    if previous is not None:
        yield previous

用法示例:

>>> list(uniq([1, 2, 2, 3, 4, 5, 6, 6]))
[1, 'SECTION : MISSING\n', 2, 3, 4, 5, 'SECTION : MISSING\n', 6]

就像是:

prev = None
with open('infile.txt','r') as fi:
    with open('outfile.txt','w') as fo:
        for line in fi:
            if prev is not None: 
                fo.write(prev if prev != line else "SECTION : MISSING\n")
            prev = line
        fo.write(prev)

將為您提供您要查找的輸出文件:

line1
line2
SECTION : MISSING
line3
line4
line5
line6

個人偏愛這些任務,我使用兩個游標而不是一個:

from itertools import tee, izip
with open(infile) as r, open(outfile, 'w') as w:
    p, c = tee(r)
    w.write(next(c))
    for prev,cur in izip(p,c):
        w.write(cur if prev!=cur else 'SECTION : MISSING\n')

如果你曾經有三個連續行(當然,兩個或兩個以上),您只需要更換的第一個,你可以用它來處理這種情況groupby

from itertools import groupby, islice, chain

def detect_missing(source):
    grouped = groupby(source)
    section = "SECTION: MISSING\n"
    for _, group in grouped:
        first_two = list(islice(group, 2))
        if len(first_two) > 1:
            first_two[0] = section
        yield from chain(first_two, group)

(Python 3,但是如果需要,您可以刪除yield from 。)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM