简体   繁体   English

如何仅在某个字符串之后读取文本文件中的行?

[英]How to only read lines in a text file after a certain string?

I'd like to read to a dictionary all of the lines in a text file that come after a particular string.我想将文本文件中特定字符串之后的所有行读入字典。 I'd like to do this over thousands of text files.我想对数千个文本文件执行此操作。

I'm able to identify and print out the particular string ( 'Abstract' ) using the following code (gotten from this answer ):我能够使用以下代码(从这个答案中获得)识别并打印出特定的字符串( 'Abstract' ):

for files in filepath:
    with open(files, 'r') as f:
        for line in f:
            if 'Abstract' in line:
                print line;

But how do I tell Python to start reading the lines that only come after the string?但是我如何告诉 Python 开始读取只出现在字符串之后的行?

Just start another loop when you reach the line you want to start from:当您到达要开始的行时,只需开始另一个循环:

for files in filepath:
    with open(files, 'r') as f:
        for line in f:
            if 'Abstract' in line:                
                for line in f: # now you are at the lines you want
                    # do work

A file object is its own iterator, so when we reach the line with 'Abstract' in it we continue our iteration from that line until we have consumed the iterator.一个文件对象是它自己的迭代器,所以当我们到达包含'Abstract'那一行时,我们从那一行继续迭代,直到我们消耗了迭代器。

A simple example:一个简单的例子:

gen = (n for n in xrange(8))

for x in gen:
    if x == 3:
        print('Starting second loop')
        for x in gen:
            print('In second loop', x)
    else:
        print('In first loop', x)

Produces:产生:

In first loop 0
In first loop 1
In first loop 2
Starting second loop
In second loop 4
In second loop 5
In second loop 6
In second loop 7

You can also use itertools.dropwhile to consume the lines up to the point you want:您还可以使用itertools.dropwhile将行消耗到您想要的点:

from itertools import dropwhile

for files in filepath:
    with open(files, 'r') as f:
        dropped = dropwhile(lambda _line: 'Abstract' not in _line, f)
        next(dropped, '')
        for line in dropped:
                print(line)

Use a boolean to ignore lines up to that point:使用布尔值忽略到该点的行:

found_abstract = False
for files in filepath:
    with open(files, 'r') as f:
        for line in f:
            if 'Abstract' in line:
                found_abstract = True
            if found_abstract:
                #do whatever you want

You can use itertools.dropwhile and itertools.islice here, a pseudo-example:您可以在此处使用itertools.dropwhileitertools.islice ,这是一个伪示例:

from itertools import dropwhile, islice

for fname in filepaths:
    with open(fname) as fin:
        start_at = dropwhile(lambda L: 'Abstract' not in L.split(), fin)
        for line in islice(start_at, 1, None): # ignore the line still with Abstract in
            print line

To me, the following code is easier to understand.对我来说,下面的代码更容易理解。

with open(file_name, 'r') as f:
    while not 'Abstract' in next(f):
        pass
    for line in f:
        #line will be now the next line after the one that contains 'Abstract'

Just to clarify, your code already "reads" all the lines.只是为了澄清,您的代码已经“读取”了所有行。 To start "paying attention" to lines after a certain point, you can just set a boolean flag to indicate whether or not lines should be ignored, and check it at each line.要在某个点之后开始“注意”行,您可以设置一个布尔标志来指示是否应忽略行,并在每一行检查它。

pay_attention = False
for line in f:
    if pay_attention:
        print line
    else:  # We haven't found our trigger yet; see if it's in this line
        if 'Abstract' in line:
            pay_attention = True

If you don't mind a little more rearranging of your code, you can also use two partial loops instead: one loop that terminates once you've found your trigger phrase ( 'Abstract' ), and one that reads all following lines.如果您不介意重新排列代码,您也可以使用两个部分循环:一个循环在您找到触发短语 ( 'Abstract' ) 后终止,另一个循环读取所有以下行。 This approach is a little cleaner (and a very tiny bit faster).这种方法更简洁(速度也快一点)。

for skippable_line in f:  # First skim over all lines until we find 'Abstract'.
    if 'Abstract' in skippable_line:
        break
for line in f:  # The file's iterator starts up again right where we left it.
    print line

The reason this works is that the file object returned by open behaves like a generator , rather than, say, a list: it only produces values as they are requested.这样做的原因是open返回的文件对象表现得像一个generator ,而不是一个列表:它只在请求时产生值。 So when the first loop stops, the file is left with its internal position set at the beginning of the first "unread" line.因此,当第一个循环停止时,文件的内部位置设置在第一个“未读”行的开头。 This means that when you enter the second loop, the first line you see is the first line after the one that triggered the break .这意味着当您进入第二个循环时,您看到的第一行是触发break之后的第一行。

Making a guess as to how the dictionary is involved, I'd write it this way:猜测字典是如何涉及的,我会这样写:

lines = dict()
for filename in filepath:
   with open(filename, 'r') as f:
       for line in f:
           if 'Abstract' in line:
               break
       lines[filename] = tuple(f)

So for each file, your dictionary contains a tuple of lines.因此,对于每个文件,您的字典都包含一个行元组。

This works because the loop reads up to and including the line you identify, leaving the remaining lines in the file ready to be read from f .这是有效的,因为循环读取并包括您标识的行,使文件中的其余行准备好从f读取。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 尝试从某个字符串前后的文本文件中将行读入数据帧 - Trying to read lines into a dataframe, from a text file before and after a certain string 您如何阅读文本文件并只打印列出的某些行? - How do you read a text file and only print certain lines listed together? 如果它包含某个字符串值,如何读取文本文件并删除它和相关的行? - How to read a text file and delete it and associated lines, if it contains a certain string value? 在Python中找到一些文本后,如何阅读某些行? - How to read certain lines after you find some text in Python? 如何读取文件并仅用特定位数打印行 - How to read a file and print lines with only a certain amount of digits 如何只打印包含某个字符串的文件中的行? - How to only print lines from a file containing a certain string? 如何在 Python 中打印包含某个字符串的文本文件的行? - How to print the lines of a text file that contain a certain string in Python? 只读包含某些特定字符串的行,并在它们上应用正则表达式 - Read only lines that contain certain specific string and apply regex on them 如何在文本文件中找到字符串并在行的前后打印行 - how to find a string in a text file and print the lines right before and after 在某些文本出现后,在文本文件中添加行 - Add lines to text file after occurence of certain text
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM