简体   繁体   中英

`cat filename | grep -B 5 -C 5 foo`

for filename in os.listdir("."):
    for line in open(filename).xreadlines():
        if "foo" in line:
            print line

So this is a simple python equivalent of cat filename | grep foo cat filename | grep foo . However, I would like the equivalent of cat filename | grep -B 5 -C 5 foo cat filename | grep -B 5 -C 5 foo , how should the above code be modified?

Simplest way is:

for filename in os.listdir("."):
    lines = open(filename).readlines()
    for i, line in enumerate(lines):
        if "foo" in line:
            for x in lines[i-5 : i+6]:
                print x,

add line numbers, breaks between blocks, etc, to taste;-).

In the extremely unlikely case that you have to deal with absolutely humungous text files (ones over 200-300 times larger than the King James Bible, for example, which is about 4.3 MB in its entirety as a text file), I recommend a generator producing a sliding window (a "FIFO" of lines). Focusing for simplicity only on searching lines excluding the first and last few ones of the file (which requires a couple of special-case loops in addition -- that's why I'm returning the index as well... because it's not always 5 in those two extra loops!-):

import collections

def sliding_windows(it):
  fifo = collections.deque()
  # prime the FIFO with the first 10 
  for i, line in enumerate(it):
    fifo.append(line)
    if i == 9: break
  # keep yielding 11-line sliding-windows
  for line in it:
    fifo.append(line)
    yield fifo, 5
    fifo.popleft()

for w, i in sliding_windows(open(filename)):
  if "foo" in w[i]:
    for line in w: print line,

I think I'll leave the special-case loops (and worries about files of very few lines;-) as exercises, since the whole thing is so incredibly hypothetical anyway.

Just a few hints...: the closing "special-case loop" is really simple -- just repeatedly drop the first line, without appending, obviously, as there's nothing more to append... the index should still be always 5, and you're done when you've just yielded a window where 5 is the last index (ie, the last line of the file); the starting case is a tad subtler as you don't want to yield until you've read the first 6 lines, and at that point the index will be 0 (first line of the file)...

Finally, for extra credit, consider how to make this work on very short files, too!-)

Although I like the simplicity of Alex's answer, it would require lots of memory when grepping large files. How about this algorithm?

import os
for filename in (f for f in os.listdir(".") if os.path.isfile(f)):
    prevLines = []
    followCount = 0
    for line in open(filename):
        prevLines.append(line)
        if "foo" in line:
            if followCount <= 0:
                for prevLine in prevLines:
                    print prevLine.strip()  
            else:
                print line.strip()
            followCount = 5
        elif followCount > 0:
            print line.strip()
        followCount -= 1
        if len(prevLines) > 5:
            prevLines.pop(0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM