简体   繁体   中英

How can I find and print a string between two strings in python?

I was wondering how I can print all the text between a \\begin statement and a \\end statement. This is my code now. Also, how can I keep from printing certain words located between these 2 statements?

content=open("file", "r")
print content
content.read()

while len(content.split(start,1)) > 1:
    start=("\begin")
    end=("\end")
    s=content
    print find_between( s, "\begin", "\end" )


def find_between( s, first, last ):
    try:
        start = s.index( first ) + len( first )
        end = s.index( last, start )
        return s[start:end]
     except ValueError:
        return ""



print find_between( s, "\begin", "\end" )

This example presumes you don't mind loosing the data on the \\begin and \\end lines. It will print all occurrences of data between \\begin and \\end .

f = open("file", "r")

content = f.readlines()

f.close()

start = "\\begin"
end = "\\end"

print "Start ==", start, "End ==", end

printlines = False

for line in content:

    if start in line:
        printlines = True
        continue

    if end in line:
        printlines = False
        continue

    if printlines == True:
        print line

Input file -

test
\begin do re me fa
so la te do.


do te la so \end fa me re do

Output -

Start == \begin End == \end
so la te do.

Assuming there is only one "\\begin" to "\\end" block in the file:

f = open('file', 'r')

between = ''
in_statement = False

for line in f:
    if '\begin' in line:
        in_statement = True
    if in_statement:
        between += line
    if '\end' in line:
        in_statement = False
        break

print between
f.close()

regex is good for this sort of thing.

In [152]: import re
In [153]: s = 'this is some \\begin string that i need to check \end some more\\begin and another \end stuff after'
In [167]: re.findall(r'\\begin(.*?)\\end', s)
[' string that i need to check ',
 ' and another ']

The regex:

use raw string because \\ means something to the regex parser. \\begin and \\end are raw character strings to match. You have to do the backslash twice because backslash means 'special' for the regex, so you need \\ to actually match a backslash. .*? = dot matches anything, * means match 0 or more repetitions. The ? turns off greedy behaviour - otherwise, it will match everything between the FIRST begin and the LAST end, instead of all in between matches.

and findall then gives you a list of all matches.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM