简体   繁体   中英

Python - How to find and save all word between two specific strings

while searching I found this threat . This is quiet close to what I need.

Which leads directly to my first Problem:

The string I have is:

line = <draw:line draw:style-name="gr1" draw:text-style-name="P1" draw:layer="layout" svg:x1="0cm" svg:y1="0cm" svg:x2="3.5cm" svg:y2="2.7cm">

I need to extract the values after the svg:x1= tags. So I tried

print re.findall(r"(?<=svg:x1) (.*?) (?=svg:y1)", line)

But nothing except [] is printed.

The second Problem is I then tried something like

line = 'string1 string2 string3'

and then

print re.findall(r"(?<=string1) (.*?) (?=string3)", line)

which gives what I want, but when I try

file.write(re.findall(r"(?<=string1) (.*?) (?=string3)", line))

(The file I want to write to is of course defined before, so I can write stuff to it)

I get "TypeError: expected a character buffer object"

So now my question in one complete sentence:
How can I extract a string between to specific strings and save it in a file?

The following regex

print re.findall(r"(?<=svg:x1) (.*?) (?=svg:y1)", line)

you wrote expects a space after svg:x1 which is not the case in your original string. The correct regex would be

print re.findall(r"(?<=svg:x1)(.*?)(?= svg:y1)", line)

The regex expression you wrote returns a list. You will have to iterate the list to write the items to the file.

data=re.findall(r"(?<=svg:x1)(.*?)(?= svg:y1)", line)
fl.write(' '.join(data))

Don't use file as a variable. It is a reserved word in Python.

You can do it without regex, something like this.

def get_middle_text(line, string_start, string_end):
    temp = line.split(string_start)[1]
    return temp.split(string_end)[0]


result = get_middle_text(line, 'string1', 'string2')

Edit:

If you can have multiple matches you can do something like below, and return field of matches:

def get_middle_text(line, string_start, string_end):
    tmp = line.split(string_start)
    result = []
    if len(tmp) == 1:
        return result
    for x in range(1, len(tmp)):
        temp = tmp[x].split(string_end)[0]
        result.append(temp)
    return result

Is this what you want?

In [10]: re.findall('svg:x1="([^"]*)"', line)
Out[10]: ['0cm']

Edit:

re.findall('svg:x1="(\d*)cm"', line)

In [11]: re.findall('svg:x1="(\d*)cm"', line)
Out[11]: ['0']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM