简体   繁体   English

如何在python中的匹配行之后抓取行

[英]How to grab the lines AFTER a matched line in python

I am an amateur using Python on and off for some time now. 我是一名业余爱好者,现在使用Python开启和关闭一段时间了。 Sorry if this is a silly question, but I was wondering if anyone knew an easy way to grab a bunch of lines if the format in the input file is like this: 很抱歉,如果这是一个愚蠢的问题,但我想知道是否有人知道如果输入文件中的格式是这样的一种简单的方法来获取一堆行:

" Heading 1 “标题1

Line 1 第1行

Line 2 第2行

Line 3 第3行

Heading 2 标题2

Line 1 第1行

Line 2 第2行

Line 3 " 第3行“

I won't know how many lines are after each heading, but I want to grab them all. 我不知道每个标题后有多少行,但我想抓住它们。 All I know is the name, or a regular expression pattern for the heading. 我所知道的只是名称或标题的正则表达式。

The only way I know to read a file is the "for line in file:" way, but I don't know how to grab the lines AFTER the line I'm currently on. 我知道读取文件的唯一方法是“for file in file:”方式,但我不知道如何抓住我当前所在行之后的行。 Hope this makes sense, and thanks for the help! 希望这是有道理的,谢谢你的帮助!

*Thanks for all the responses! *感谢所有的回复! I have tried to implement some of the solutions, but my problem is that not all the headings are the same name, and I'm not sure how to work around it. 我试图实现一些解决方案,但我的问题是并非所有标题都是相同的名称,我不知道如何解决它。 I need a different regular expression for each... any suggestions? 我需要一个不同的正则表达式...任何建议? * *

Generator Functions 发电机功能

def group_by_heading( some_source ):
    buffer= []
    for line in some_source:
        if line.startswith( "Heading" ):
            if buffer: yield buffer
            buffer= [ line ]
        else:
            buffer.append( line )
    yield buffer

with open( "some_file", "r" ) as source:
    for heading_and_lines in group_by_heading( source ):
        heading= heading_and_lines[0]
        lines= heading_and_lines[1:]
        # process away.

You could use a variable to mark where which heading you are currently tracking, and if it is set, grab every line until you find another heading: 您可以使用变量来标记当前正在跟踪的标题的位置,如果已设置,则抓住每一行,直到找到另一个标题:

data = {}
for line in file:
    line = line.strip()
    if not line: continue

    if line.startswith('Heading '):
        if line not in data: data[line] = []
        heading = line
        continue

    data[heading].append(line)

Here's a http://codepad.org snippet that shows how it works: http://codepad.org/KA8zGS9E 这是一个http://codepad.org片段,展示了它的工作原理: http//codepad.org/KA8zGS9E

Edit : If you don't care about the actual heading values and just want a list at the end, you can use this: 编辑 :如果你不关心实际的标题值,只想在最后找到一个列表,你可以使用:

data = []
for line in file:
    line = line.strip()
    if not line: continue

    if line.startswith('Heading '):
        continue

    data.append(line)

Basically, you don't really need to track a variable for the heading, instead you can just filter out all lines that match the Heading pattern. 基本上,您实际上不需要跟踪标题的变量,而是可以过滤掉与标题模式匹配的所有行。

Other than a generator, I think we can create a dict where the key is "Heading" and the value is one list to save the lines. 除了生成器之外,我认为我们可以创建一个dict,其中键是“Heading”,值是一个保存行的列表。 Here is the code 这是代码

odd_map = {}
odd_list = []
with open(file, 'r') as myFile:
    lines = myFile.readlines()
    for line in lines:
        if "Heading" in line:
            odd_list = []
            odd_map[line.strip()] = odd_list
        else:    
            odd_list.append(line.strip())

for company, odds in odd_map.items():
    print(company)
    for odd in odds:
        print(odd)

I don't really know Python, but here's a bit of pseudocode. 我真的不懂Python,但这里有点伪代码。

int header_found=0; int header_found = 0;

[start where loop where you're looping through lines of file] [从你循环文件行的循环开始]

if(header_found==1) [grab line]; if(header_found == 1)[抢线]; header_found=0; header_found = 0;

if(line=~/[regexp for header]/) header_found=1; if(line =〜/ [regexp for header] /)header_found = 1;

The idea is to have a variable that keeps track of whether or not you've found a header, and if you have, to grab the next line. 我们的想法是有一个变量来跟踪你是否找到了一个标题,如果有的话,可以抓住下一行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM