简体   繁体   English

使用python在文本文件中的两个字符串之间提取值

[英]Extract Values between two strings in a text file using python

Lets say I have a Text file with the below content假设我有一个包含以下内容的文本文件

fdsjhgjhg
fdshkjhk
Start
Good Morning
Hello World
End
dashjkhjk
dsfjkhk

Now I need to write a Python code which will read the text file and copy the contents between Start and end to another file.现在我需要编写一个 Python 代码,它将读取文本文件并将开始和结束之间的内容复制到另一个文件中。

I wrote the following code.我写了以下代码。

inFile = open("data.txt")
outFile = open("result.txt", "w")
buffer = []
keepCurrentSet = True
for line in inFile:
    buffer.append(line)
    if line.startswith("Start"):
        #---- starts a new data set
        if keepCurrentSet:
            outFile.write("".join(buffer))
        #now reset our state
        keepCurrentSet = False
        buffer = []
    elif line.startswith("End"):
        keepCurrentSet = True
inFile.close()
outFile.close()

I'm not getting the desired output as expected I'm just getting Start What I want to get is all the lines between Start and End.我没有按预期获得所需的输出我只是开始我想要得到的是开始和结束之间的所有线。 Excluding Start & End.不包括开始和结束。

Just in case you have multiple "Start"s and "End"s in your text file, this will import all the data together, excluding all the "Start"s and "End"s.以防万一您的文本文件中有多个“开始”和“结束”,这会将所有数据一起导入,不包括所有“开始”和“结束”。

with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
    copy = False
    for line in infile:
        if line.strip() == "Start":
            copy = True
            continue
        elif line.strip() == "End":
            copy = False
            continue
        elif copy:
            outfile.write(line)

If the text files aren't necessarily large, you can get the whole content of the file then use regular expressions:如果文本文件不一定很大,您可以获取文件的全部内容,然后使用正则表达式:

import re
with open('data.txt') as myfile:
    content = myfile.read()

text = re.search(r'Start\n.*?End', content, re.DOTALL).group()
with open("result.txt", "w") as myfile2:
    myfile2.write(text)

I'm not a Python expert, but this code should do the job.我不是 Python 专家,但是这段代码应该可以完成这项工作。

inFile = open("data.txt")
outFile = open("result.txt", "w")
keepCurrentSet = False
for line in inFile:
    if line.startswith("End"):
        keepCurrentSet = False

    if keepCurrentSet:
        outFile.write(line)

    if line.startswith("Start"):
        keepCurrentSet = True
inFile.close()
outFile.close()

Using itertools.dropwhile , itertools.takewhile , itertools.islice :使用itertools.dropwhileitertools.takewhileitertools.islice

import itertools

with open('data.txt') as f, open('result.txt', 'w') as fout:
    it = itertools.dropwhile(lambda line: line.strip() != 'Start', f)
    it = itertools.islice(it, 1, None)
    it = itertools.takewhile(lambda line: line.strip() != 'End', it)
    fout.writelines(it)

UPDATE : As inspectorG4dget commented, above code copies over the first block.更新:正如inspectorG4dget 所评论的,上面的代码复制了第一个块。 To copy multiple blocks, use following:要复制多个块,请使用以下命令:

import itertools

with open('data.txt', 'r') as f, open('result.txt', 'w') as fout:
    while True:
        it = itertools.dropwhile(lambda line: line.strip() != 'Start', f)
        if next(it, None) is None: break
        fout.writelines(itertools.takewhile(lambda line: line.strip() != 'End', it))

Move the outFile.write call into the 2nd if :outFile.write调用移动到第二个if

inFile = open("data.txt")
outFile = open("result.txt", "w")
buffer = []
for line in inFile:
    if line.startswith("Start"):
        buffer = ['']
    elif line.startswith("End"):
        outFile.write("".join(buffer))
        buffer = []
    elif buffer:
        buffer.append(line)
inFile.close()
outFile.close()
import re

inFile = open("data.txt")
outFile = open("result.txt", "w")
buffer1 = ""
keepCurrentSet = True
for line in inFile:
    buffer1=buffer1+(line)

buffer1=re.findall(r"(?<=Start) (.*?) (?=End)", buffer1)  
outFile.write("".join(buffer1))  
inFile.close()
outFile.close()

I would handle it like this :我会这样处理:

inFile = open("data.txt")
outFile = open("result.txt", "w")

data = inFile.readlines()

outFile.write("".join(data[data.index('Start\n')+1:data.index('End\n')]))
inFile.close()
outFile.close()

if one wants to keep the start and end lines/keywords while extracting the lines between 2 strings.如果要在提取 2 个字符串之间的行时保留开始和结束行/关键字。

Please find below the code snippet that I used to extract sql statements from a shell script请在下面找到我用来从 shell 脚本中提取 sql 语句的代码片段

def process_lines(in_filename, out_filename, start_kw, end_kw):
    try:
        inp = open(in_filename, 'r', encoding='utf-8', errors='ignore')
        out = open(out_filename, 'w+', encoding='utf-8', errors='ignore')
    except FileNotFoundError as err:
        print(f"File {in_filename} not found", err)
        raise
    except OSError as err:
        print(f"OS error occurred trying to open {in_filename}", err)
        raise
    except Exception as err:
        print(f"Unexpected error opening {in_filename} is",  repr(err))
        raise
    else:
        with inp, out:
            copy = False
            for line in inp:
                # first IF block to handle if the start and end on same line
                if line.lstrip().lower().startswith(start_kw) and line.rstrip().endswith(end_kw):
                    copy = True
                    if copy:  # keep the starts with keyword
                        out.write(line)
                    copy = False
                    continue
                elif line.lstrip().lower().startswith(start_kw):
                    copy = True
                    if copy:  # keep the starts with keyword
                        out.write(line)
                    continue
                elif line.rstrip().endswith(end_kw):
                    if copy:  # keep the ends with keyword
                        out.write(line)
                    copy = False
                    continue
                elif copy:
                    # write
                    out.write(line)


if __name__ == '__main__':
    infile = "/Users/testuser/Downloads/testdir/BTEQ_TEST.sh"
    outfile = f"{infile}.sql"
    statement_start_list = ['database', 'create', 'insert', 'delete', 'update', 'merge', 'delete']
    statement_end = ";"
    process_lines(infile, outfile, tuple(statement_start_list), statement_end)

Files are iterators in Python, so this means you don't need to hold a "flag" variable to tell you what lines to write.文件是 Python 中的迭代器,因此这意味着您不需要持有“标志”变量来告诉您要写哪些行。 You can simply use another loop when you reach the start line, and break it when you reach the end line:您可以在到达起始行时简单地使用另一个循环,并在到达结束行时中断它:

with open("data.txt") as in_file, open("result.text", 'w') as out_file:
    for line in in_file:
        if line.strip() == "Start":
            for line in in_file:
                if line.strip() == "End":
                    break
                out_file.write(line)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python:提取文本文件中两个字符串之间的值 - Python: extract values between two strings in text file 使用 Python 提取文本文件中两个字符串之间的文本 - Extract text present in between two strings in a text file using Python 使用 Python 提取文本文件中两个字符串之间的文本数据 - Extract textual data in between two strings in a text file using Python 使用 python 在两个字符串之间提取多行文本 - Extract multiline text between two strings using python 如果在 Python 中使用正则表达式在两个字符串之间存在子字符串,则提取两个字符串之间的文本 - Extract text between two strings if a substring exists between the two strings using Regex in Python Python将两个字符串之间的文本提取到Excel中 - Python Extract Text between two strings into Excel 使用python读取文本文件中两个字符串之间的行 - Reading lines between two strings in text file using python 使用BeautifulSoup和Python从网页中提取两个文本字符串之间的文本 - Extract text between two text strings from webpage with BeautifulSoup and Python 使用正则表达式在python中提取两个字符串之间的字符串 - extract strings between two strings in python using regular expression python3提取txt文件中两个字符串之间的字符串 - python3 extract string between two strings in a txt file
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM