简体   繁体   English

如何使用sed -n在Python中从文本文件中提取一系列行?

[英]How do I extract a range of lines from a text file using sed -n but in Python?

Say I have a file with 10GB that has 20,000 lines filled with the digits of pi. 假设我有一个10GB的文件,其中20,000行用pi数字填充。

  • 123123 123123
  • 12312312 12312312
  • 123123 123123
  • 123123 123123
  • 12312312 12312312
  • 123123 123123

How do I extract lines 10,000 to 20,000 using the unix command sed -n ? 如何使用unix命令sed -n提取10,000至20,000行?

I'd like for each line with a newline character to export to a file using the code below. 我希望使用换行符的每一行都可以使用以下代码导出到文件中。

So far, I have the following: 到目前为止,我有以下内容:

com = "sed -n \' " + str(window[0]) + "," + str(window[1]) + "p\' " + "sample.txt" + ">" + "output.txt"
os.system(com)

but it is throwing concatenation errors. 但它会引发串联错误。

How should I phrase the command sed -n for Python in the program below? 我应该如何在下面的程序中为python设置命令sed -n

inputFileName = "sample.txt"

import itertools
import linecache


def sliding_window(window_size, step_size, last_window_start):
    for i in xrange(0, last_window_start, step_size):
        yield (i, i + window_size)
    yield (last_window_start, total_pi_digits)

def PiCrop(window_size, step_size):

    f = open(inputFileName, 'r')

    first_line = f.readline().split()

    total_pi_digits = int(first_line[0])

    last_window_start = total_pi_digits-(total_pi_digits%window_size)

    lastcounter = (total_pi_digits//window_size)*(window_size/step_size)

    flags = [False for i in range(lastcounter)]

    first_line[0] = str(window_size)
    second_line = f.readline().split()
    offset = int(round(float(second_line[0].strip('\n'))))
    first_line = " ".join(first_line)

    f. close()

    with open(inputFileName, 'r') as f:
        header = f.readline()

        for counter, window in enumerate(sliding_window(window_size,step_size,last_window_start)):

            with open('PiCrop_{}.txt'.format(counter), 'w') as output:

                if (flags[counter] == False):
                    flags[counter] = True

                    headerline = float(linecache.getline(inputFileName, window[1]+1)) - offset
                    output.write(str(window_size) + " " + str("{0:.4f}".format(headerline)) + " " + 'L' + '\n')

                                   com = "sed -n \' " + str(window[0]) + "," + str(window[1]) + "p\' " + "sample.txt" + ">" + "output.txt"
                os.system(com)

PiCrop(1000,500)

You can yield each line from the file: 您可以从文件中产生每一行:

def lines(filename):
    with open(filename) as f:
        for line in f:
            yield line

And you can slice the sequence using islice : 您可以使用islice分割序列:

from itertools import islice

with open('PiCrop.txt', 'w') as output:
    for line in islice(lines('sample.txt'), 10000, 20001):
        output.write(line)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM