简体   繁体   English

在python中的特定行之后拆分文本文件

[英]Split text file after specific line in python

I'm trying to write a code to read Fresco files and plot the results.我正在尝试编写代码来读取 Fresco 文件并绘制结果。 Fresco produces one big file that looks something like this Fresco 生成一个大文件,看起来像这样

theta  sigma
1        0.1
2        0.1
3        0.2
...
END
some text...
theta   sigma
1        0.3
2        0.2
...
END
more data...

I want to produce a new file after every 'END' to analyze the data separately.我想在每个 'END' 之后生成一个新文件来分别分析数据。 I tried some of the solutions proposed to other answers like我尝试了一些针对其他答案提出的解决方案,例如

with open('fort.16', 'r') as infile, open('output_fort.16', 'w') as outfile:
copy= False
for line in infile:
    if line.strip() == '# legend':
        copy = True
        continue
    elif line.strip()=='End':
        copy = False
    elif copy:
        outfile.write(line)

but this is not what i need.但这不是我需要的。 Im fairly new to python so any help is much appreciated.我对python相当陌生,因此非常感谢任何帮助。

I managed to solve this with a nested generator:我设法用嵌套生成器解决了这个问题:

import re

SECTION_START = re.compile(r'^\s*theta\s+sigma\s*$')
SECTION_END = re.compile(r'^\s*END\s*$')

def fresco_iter(stream):
    def inner(stream):
        # Yields each line until an end marker is found (or EOF)
        for line in stream:
            if line and not SECTION_END.match(line):
                yield line
                continue
            break

    # Find a start marker, then break off into a nested iterator
    for line in stream:
        if line:
            if SECTION_START.match(line):
                yield inner(stream)
            continue
        break

The fresco_iter method returns a generator that can be for-looped over. fresco_iter方法返回一个可以 for 循环的生成器。 It returns 1 generator per section of theta sigma pairs.它为每部分theta sigma对返回 1 个生成器。

>>> with open('fort.16', 'r') as fh:
...     print(list(fresco_iter(fh)))
[<generator object fresco_iter.<locals>.inner at 0x7fbc6da15678>,
 <generator object fresco_iter.<locals>.inner at 0x7fbc6da15570>]

So to make use of this, you create your own nested loop to process the nested generators.因此,要利用这一点,您可以创建自己的嵌套循环来处理嵌套生成器。

filename = 'fort.16'

with open(filename, 'r') as fh:
    for nested_iter in fresco_iter(fh):
        print('--- start')
        for line in nested_iter:
            print(line.rstrip())
        print('--- end')

would output...会输出...

--- start
1        0.1
2        0.1
3        0.2
--- end
--- start
1        0.3
2        0.2
--- end

This strategy only ever holds 1 line of your input file in memory at a time, so would work for any size file, on even the smallest device... because generators are awesome.这种策略一次只能在内存中保存 1 行输入文件,因此适用于任何大小的文件,即使是在最小的设备上......因为生成器很棒。

So to take it all the way... separating the output into individual files:所以要一路走下去……将输出分成单独的文件:

with open(filename, 'r') as fh_in:
    for (i, nested_iter) in enumerate(fresco_iter(fh_in)):
        with open('{}.part-{:04d}'.format(filename, i), 'w') as fh_out:
            for line in nested_iter:
                fh_out.write(line)

Will output just the numbers to separate files named fort.16.part-0000 and fort.16.part-0001 .输出数字以分隔名为fort.16.part-0000fort.16.part-0001

I hope this helps, happy coding!我希望这会有所帮助,编码愉快!

fp = open("random.txt")

data = []
temp = []

for i, line in enumerate(fp):
    if line.strip() == "END":
        new_file = open("file"+str(i)+".txt", "a+")
        for i in temp:
            new_file.write(i+"\n")
        temp = []
        new_file.close()
        continue
    temp.append(line.strip())

fp.close()
print(data)

Here you go this one, creates a new file everytime.给你这个,每次都创建一个新文件。 The file name is file and the index of where ever the "END" line was found.文件名是 file 和找到“END”行的位置的索引。 :) :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 用文本中的特定行在Python中拆分字符串 - Split a string in Python by a specific line in text 当行以特定字符开头时,Python 将文本文件拆分为不同的文本文件 - Python split text file into different text files when line starts with a specific character 在文本文件 python 中用新行分割 - split by new line in text file python 使用python通过特定的字符串或行号将文本文件拆分为多个文件 - Split text file to multiple files by specific string or line number using python PYTHON Re:性能:从文本文件中的特定行开始,读取一行并根据制表符对其进行拆分,然后访问每个元素。 - PYTHON Re: Performance: Starting from a specific line in a text file read a line and split it based on tabs, then access each element. 如何从文本文件中逐行读取变量并将其拆分 - how to read a variable line by line from a text file and split it python 读取文本文件的每一行,然后在python中用空格分隔每一行 - Read each line of a text file and then split each line by spaces in python Python提取文本文件中的特定行 - Python extracting specific line in text file 在 python 的文本文件中编辑特定行的一部分 - Editing part of a specific line in a text file in python Python 如何在特定行到特定行之后开始读取文件? - Python how to start reading file after a specific line to a specific line?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM