Python-如何确保从文件读取的行仅包含给定的字符串，而没有其他内容

Question

In order to make sure I start and stop reading a text file exactly where I want to, I am providing 'start1'<->'end1', 'start2'<->'end2' as tags in between the text file and providing that to my python script. 为了确保我准确地开始和停止阅读文本文件，我在文本文件之间提供“ start1” <->“ end1”，“ start2” <->“ end2”作为标签，并提供那到我的python脚本。 In my script I read it as: 在我的脚本中，我将其读取为：

start_end = ['start1','end1']
line_num = []
        with open(file_path) as fp1:
            for num, line in enumerate(fp1, 1):
                for i in start_end:
                    if i in line:
                        line_num.append(num)
        fp1.close()
        print '\nLine number: ', line_num
        fp2 = open(file_path)
        for k, line2 in enumerate(fp2): 
            for x in range(line_num[0], line_num[1] - 1):
                if k == x:
                    header.append(line2)
        fp2.close()

This works well until I reach start10 <-> end10 and further. 在我到达start10 <-> end10之前，这一直很好。 Eg. 例如。 it checks if I have "start2" in the line and also reads the text that has "start21" and similarly for end tag as well. 它检查行中是否包含“ start2”，并读取包含“ start21”的文本，以及结束标记的类似内容。 so providing "start1, end1" as input also reads "start10, end10". 因此提供“ start1，end1”作为输入也将读取“ start10，end10”。 If I replace the line: 如果我替换行：

if i in line:

with 同

if i == line:

it throws an error. 它抛出一个错误。

How can I make sure that the script reads the line that contains ONLY "start1" and not "start10"? 如何确保脚本读取仅包含“ start1”而不包含“ start10”的行？

Answer 1

import re
prog = re.compile('start1$')
if prog.match(line):
   print line

That should return None if there is no match and return a regex match object if the line matches the compiled regex. 如果没有匹配项，则应返回None；如果该行与已编译的正则表达式匹配，则应返回一个正则表达式匹配对象。 The '$' at the end of the regex says that's the end of the line, so 'start1' works but 'start10' doesn't. 正则表达式末尾的“ $”表示该行的末尾，因此“ start1”有效，但“ start10”无效。

or another way.. 或其他方式

def test(line):
   import re
   prog = re.compile('start1$')
   return prog.match(line) != None
> test('start1')
True
> test('start10')
False

Answer 2

You probably want to look into regular expressions. 您可能想研究正则表达式。 The Python re library has some good regex tools. Python re库具有一些很好的正则表达式工具。 It would let you define a string to compare your line to and it has the ability to check for start and end of lines. 它可以让您定义一个字符串以与行进行比较，并且可以检查行的开始和结束。

Answer 3

Since your markers are always at the end of the line, change: 由于标记始终位于行尾，因此请更改：

start_end = ['start1','end1']

to: 至：

start_end = ['start1\n','end1\n']

Answer 4

If you can control the input file, consider adding an underscore (or any non-number character) to the end of each tag. 如果可以控制输入文件，请考虑在每个标签的末尾添加下划线（或任何非数字字符）。

'start1_'<->'end1_' '启动1 _'< - > 'end1_'

'start10_'<->'end10_' 'start10 _'< - > 'end10_'

The regular expression solution presented in other answers is more elegant, but requires using regular expressions. 其他答案中提供的正则表达式解决方案更为优雅，但需要使用正则表达式。

Answer 5

You can do this with find() : 您可以使用find()做到这一点：

for num, line in enumerate(fp1, 1):
    for i in start_end:
        if i in line:
            # make sure the next char isn't '0'
            if line[line.find(i)+len(i)] != '0':
                line_num.append(num)

Python-如何确保从文件读取的行仅包含给定的字符串，而没有其他内容

问题描述

5 个解决方案

解决方案1
2 2015-04-29 18:54:17

解决方案2
1 2015-04-29 18:55:08

解决方案3
1 已采纳 2015-04-29 20:43:01

解决方案4
0 2015-04-29 19:00:05

解决方案5
0 2015-04-29 19:01:38

Python-如何确保从文件读取的行仅包含给定的字符串，而没有其他内容

问题描述

5 个解决方案

解决方案1 2 2015-04-29 18:54:17

解决方案2 1 2015-04-29 18:55:08

解决方案3 1 已采纳 2015-04-29 20:43:01

解决方案4 0 2015-04-29 19:00:05

解决方案5 0 2015-04-29 19:01:38

解决方案1
2 2015-04-29 18:54:17

解决方案2
1 2015-04-29 18:55:08

解决方案3
1 已采纳 2015-04-29 20:43:01

解决方案4
0 2015-04-29 19:00:05

解决方案5
0 2015-04-29 19:01:38