反復提取文本文件中兩個分隔符之間的一行，Python

Question

我有以下格式的文本文件：

DELIMITER1
extract me
extract me
extract me
DELIMITER2

我想提取 .txt 文件中 DELIMITER1 和 DELIMITER2 之間的每個extract me的塊

這是我當前的不良代碼：

import re
def GetTheSentences(file):
     fileContents =  open(file)
     start_rx = re.compile('DELIMITER')
     end_rx = re.compile('DELIMITER2')

     line_iterator = iter(fileContents)
     start = False
     for line in line_iterator:
           if re.findall(start_rx, line):

                start = True
                break
      while start:
           next_line = next(line_iterator)
           if re.findall(end_rx, next_line):
                break

           print next_line

           continue
      line_iterator.next()

有任何想法嗎？

Answer 1

您可以使用re.S （ DOTALL 標志）將其簡化為一個正則表達式。

import re
def GetTheSentences(infile):
     with open(infile) as fp:
         for result in re.findall('DELIMITER1(.*?)DELIMITER2', fp.read(), re.S):
             print result
# extract me
# extract me
# extract me

這也利用了非貪婪運算符.*? ，因此將找到多個 DELIMITER1-DELIMITER2 對的非重疊塊。

Answer 2

這應該做你想要的：

import re
def GetTheSentences(file):
    start_rx = re.compile('DELIMITER')
    end_rx = re.compile('DELIMITER2')

    start = False
    output = []
    with open(file, 'rb') as datafile:
         for line in datafile.readlines():
             if re.match(start_rx, line):
                 start = True
             elif re.match(end_rx, line):
                 start = False
             if start:
                  output.append(line)
    return output

您以前的版本看起來應該是迭代器 function。 您希望您的 output 一次退回一件商品嗎？ 這有點不同。

Answer 3

如果分隔符在一行內：

def get_sentences(filename):
    with open(filename) as file_contents:
        d1, d2 = '.', ',' # just example delimiters
        for line in file_contents:
            i1, i2 = line.find(d1), line.find(d2)
            if -1 < i1 < i2:
                yield line[i1+1:i2]


sentences = list(get_sentences('path/to/my/file'))

如果他們在自己的線上：

def get_sentences(filename):
    with open(filename) as file_contents:
        d1, d2 = '.', ',' # just example delimiters
        results = []
        for line in file_contents:
            if d1 in line:
                results = []
            elif d2 in line:
                yield results
            else:
                results.append(line)

sentences = list(get_sentences('path/to/my/file'))

Answer 4

這是列表推導的好工作，不需要正則表達式。 第一個列表 comp 清除打開 txt 文件時找到的文本行列表中的典型\n 。 第二個列表 comp 僅使用in運算符來識別要過濾的序列模式。

def extract_lines(file):
    scrubbed = [x.strip('\n') for x in open(file, 'r')]
    return [x for x in scrubbed if x not in ('DELIMITER1','DELIMITER2')]

反復提取文本文件中兩個分隔符之間的一行，Python

問題描述

4 個解決方案

解決方案1
21 已采納 2011-08-17 19:59:42

解決方案2
2 2011-08-17 19:54:13

解決方案3
2 2011-08-17 19:55:09

解決方案4
0 2015-05-10 05:00:01

反復提取文本文件中兩個分隔符之間的一行，Python

問題描述

4 個解決方案

解決方案1 21 已采納 2011-08-17 19:59:42

解決方案2 2 2011-08-17 19:54:13

解決方案3 2 2011-08-17 19:55:09

解決方案4 0 2015-05-10 05:00:01

解決方案1
21 已采納 2011-08-17 19:59:42

解決方案2
2 2011-08-17 19:54:13

解決方案3
2 2011-08-17 19:55:09

解決方案4
0 2015-05-10 05:00:01