使用python在文本文件中的两个字符串之间提取值

Question

Lets say I have a Text file with the below content假设我有一个包含以下内容的文本文件

fdsjhgjhg
fdshkjhk
Start
Good Morning
Hello World
End
dashjkhjk
dsfjkhk

Now I need to write a Python code which will read the text file and copy the contents between Start and end to another file.现在我需要编写一个 Python 代码，它将读取文本文件并将开始和结束之间的内容复制到另一个文件中。

I wrote the following code.我写了以下代码。

inFile = open("data.txt")
outFile = open("result.txt", "w")
buffer = []
keepCurrentSet = True
for line in inFile:
    buffer.append(line)
    if line.startswith("Start"):
        #---- starts a new data set
        if keepCurrentSet:
            outFile.write("".join(buffer))
        #now reset our state
        keepCurrentSet = False
        buffer = []
    elif line.startswith("End"):
        keepCurrentSet = True
inFile.close()
outFile.close()

I'm not getting the desired output as expected I'm just getting Start What I want to get is all the lines between Start and End.我没有按预期获得所需的输出我只是开始我想要得到的是开始和结束之间的所有线。 Excluding Start & End.不包括开始和结束。

Answer 1

Just in case you have multiple "Start"s and "End"s in your text file, this will import all the data together, excluding all the "Start"s and "End"s.以防万一您的文本文件中有多个“开始”和“结束”，这会将所有数据一起导入，不包括所有“开始”和“结束”。

with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
    copy = False
    for line in infile:
        if line.strip() == "Start":
            copy = True
            continue
        elif line.strip() == "End":
            copy = False
            continue
        elif copy:
            outfile.write(line)

Answer 2

If the text files aren't necessarily large, you can get the whole content of the file then use regular expressions:如果文本文件不一定很大，您可以获取文件的全部内容，然后使用正则表达式：

import re
with open('data.txt') as myfile:
    content = myfile.read()

text = re.search(r'Start\n.*?End', content, re.DOTALL).group()
with open("result.txt", "w") as myfile2:
    myfile2.write(text)

Answer 3

I'm not a Python expert, but this code should do the job.我不是 Python 专家，但是这段代码应该可以完成这项工作。

inFile = open("data.txt")
outFile = open("result.txt", "w")
keepCurrentSet = False
for line in inFile:
    if line.startswith("End"):
        keepCurrentSet = False

    if keepCurrentSet:
        outFile.write(line)

    if line.startswith("Start"):
        keepCurrentSet = True
inFile.close()
outFile.close()

Answer 4

Using itertools.dropwhile , itertools.takewhile , itertools.islice :使用itertools.dropwhile 、 itertools.takewhile 、 itertools.islice ：

import itertools

with open('data.txt') as f, open('result.txt', 'w') as fout:
    it = itertools.dropwhile(lambda line: line.strip() != 'Start', f)
    it = itertools.islice(it, 1, None)
    it = itertools.takewhile(lambda line: line.strip() != 'End', it)
    fout.writelines(it)

UPDATE : As inspectorG4dget commented, above code copies over the first block.更新：正如inspectorG4dget 所评论的，上面的代码复制了第一个块。 To copy multiple blocks, use following:要复制多个块，请使用以下命令：

import itertools

with open('data.txt', 'r') as f, open('result.txt', 'w') as fout:
    while True:
        it = itertools.dropwhile(lambda line: line.strip() != 'Start', f)
        if next(it, None) is None: break
        fout.writelines(itertools.takewhile(lambda line: line.strip() != 'End', it))

Answer 5

Move the outFile.write call into the 2nd if :将outFile.write调用移动到第二个if ：

inFile = open("data.txt")
outFile = open("result.txt", "w")
buffer = []
for line in inFile:
    if line.startswith("Start"):
        buffer = ['']
    elif line.startswith("End"):
        outFile.write("".join(buffer))
        buffer = []
    elif buffer:
        buffer.append(line)
inFile.close()
outFile.close()

Answer 6

import re

inFile = open("data.txt")
outFile = open("result.txt", "w")
buffer1 = ""
keepCurrentSet = True
for line in inFile:
    buffer1=buffer1+(line)

buffer1=re.findall(r"(?<=Start) (.*?) (?=End)", buffer1)  
outFile.write("".join(buffer1))  
inFile.close()
outFile.close()

Answer 7

I would handle it like this :我会这样处理：

inFile = open("data.txt")
outFile = open("result.txt", "w")

data = inFile.readlines()

outFile.write("".join(data[data.index('Start\n')+1:data.index('End\n')]))
inFile.close()
outFile.close()

Answer 8

if one wants to keep the start and end lines/keywords while extracting the lines between 2 strings.如果要在提取 2 个字符串之间的行时保留开始和结束行/关键字。

Please find below the code snippet that I used to extract sql statements from a shell script请在下面找到我用来从 shell 脚本中提取 sql 语句的代码片段

def process_lines(in_filename, out_filename, start_kw, end_kw):
    try:
        inp = open(in_filename, 'r', encoding='utf-8', errors='ignore')
        out = open(out_filename, 'w+', encoding='utf-8', errors='ignore')
    except FileNotFoundError as err:
        print(f"File {in_filename} not found", err)
        raise
    except OSError as err:
        print(f"OS error occurred trying to open {in_filename}", err)
        raise
    except Exception as err:
        print(f"Unexpected error opening {in_filename} is",  repr(err))
        raise
    else:
        with inp, out:
            copy = False
            for line in inp:
                # first IF block to handle if the start and end on same line
                if line.lstrip().lower().startswith(start_kw) and line.rstrip().endswith(end_kw):
                    copy = True
                    if copy:  # keep the starts with keyword
                        out.write(line)
                    copy = False
                    continue
                elif line.lstrip().lower().startswith(start_kw):
                    copy = True
                    if copy:  # keep the starts with keyword
                        out.write(line)
                    continue
                elif line.rstrip().endswith(end_kw):
                    if copy:  # keep the ends with keyword
                        out.write(line)
                    copy = False
                    continue
                elif copy:
                    # write
                    out.write(line)


if __name__ == '__main__':
    infile = "/Users/testuser/Downloads/testdir/BTEQ_TEST.sh"
    outfile = f"{infile}.sql"
    statement_start_list = ['database', 'create', 'insert', 'delete', 'update', 'merge', 'delete']
    statement_end = ";"
    process_lines(infile, outfile, tuple(statement_start_list), statement_end)

Answer 9

Files are iterators in Python, so this means you don't need to hold a "flag" variable to tell you what lines to write.文件是 Python 中的迭代器，因此这意味着您不需要持有“标志”变量来告诉您要写哪些行。 You can simply use another loop when you reach the start line, and break it when you reach the end line:您可以在到达起始行时简单地使用另一个循环，并在到达结束行时中断它：

with open("data.txt") as in_file, open("result.text", 'w') as out_file:
    for line in in_file:
        if line.strip() == "Start":
            for line in in_file:
                if line.strip() == "End":
                    break
                out_file.write(line)

使用python在文本文件中的两个字符串之间提取值

问题描述

9 个解决方案

解决方案1
47 已采纳 2013-09-18 06:17:56

解决方案2
6 2013-09-18 06:18:48

解决方案3
5 2013-09-18 06:18:36

解决方案4
5 2013-09-18 06:21:14

解决方案5
3 2013-09-18 06:19:14

解决方案6
2 2013-09-18 06:49:59

解决方案7
1 2013-09-18 06:51:00

解决方案8
0 2021-06-23 03:57:16

解决方案9
0 2021-08-19 11:39:00

使用python在文本文件中的两个字符串之间提取值

问题描述

9 个解决方案

解决方案1 47 已采纳 2013-09-18 06:17:56

解决方案2 6 2013-09-18 06:18:48

解决方案3 5 2013-09-18 06:18:36

解决方案4 5 2013-09-18 06:21:14

解决方案5 3 2013-09-18 06:19:14

解决方案6 2 2013-09-18 06:49:59

解决方案7 1 2013-09-18 06:51:00

解决方案8 0 2021-06-23 03:57:16

解决方案9 0 2021-08-19 11:39:00

解决方案1
47 已采纳 2013-09-18 06:17:56

解决方案2
6 2013-09-18 06:18:48

解决方案3
5 2013-09-18 06:18:36

解决方案4
5 2013-09-18 06:21:14

解决方案5
3 2013-09-18 06:19:14

解决方案6
2 2013-09-18 06:49:59

解决方案7
1 2013-09-18 06:51:00

解决方案8
0 2021-06-23 03:57:16

解决方案9
0 2021-08-19 11:39:00