简体   繁体   中英

python regex re.compile match

I am trying to match (using regex in python):

http://images.mymaterials.com/images/steel-images/small/steel/steel800/steel800-2.jpg

in the following string:

http://www.mymaterialssite.com','http://images.mymaterials.com/images/steel-images/small/steel/steel800/steel800-2.jpg','Model Photo'

My code has something like this:

temp="http://www.mymaterialssite.com','http://images.mymaterials.com/images/steel-images/small/steel/steel800/steel800-2.jpg','Model Photo'"
dummy=str(re.compile(r'.com'',,''(.*?)'',,''Model Photo').search(str(temp)).group(1))

I do not think the "dummy" is correct & I am unsure how I "escape" the single and double quotes in the regex re.compile command.

I tried googling for the problem, but I couldnt find anything relevant.

Would appreciate any guidance on this.

Thanks.

The easiest way to deal with strings in Python that contain escape characters and quotes is to triple double-quote the string ( """ ) and prefix it with r . For example:

my_str = r"""This string would "really "suck"" to write if I didn't
 know how to tell Python to parse it as "raw" text with the 'r' character and
 triple " quotes. Especially since I want \n to show up as a backlash followed
 by n. I don't want \0 to be the null byte either!"""

The r means "take escape characters as literal". The triple double-quotes ( """ ) prevent single-quotes, double-quotes, and double double-quotes from prematurely ending the string.

EDIT: I expanded the example to include things like \\0 and \\n . In a normal string (not a raw string) a \\ (the escape character) signifies that the next character has special meaning. For example \\n means "the newline character". If you literally wanted the character \\ followed by n in your string you would have to write \\\\n , or just use a raw string instead, as I show in the example above.

You can also read about string literals in the Python documentation here:

Try triple quotes:

import re
tmp=""".*http://images.mymaterials.com/images/steel-images/small/steel/steel800/steel800-2.jpg.*"""
str="""http://www.mymaterialssite.com\'\,\'http://images.mymaterials.com/images/steel-images/small/steel/steel800/steel800-2.jpg','Model Photo'"""
x=re.match(tmp,str)
if x!=None:
    print x.group()

Also you were missing the .* in the beginning of the pattern and at the end. I added that too.

Commas don't need to be escaped, and single quotes don't need to be escaped if you use double quotes to create the string:

>>> dummy=re.compile(r".com','(.*?)','Model Photo").search(temp).group(1)
>>> print dummy
http://images.mymaterials.com/images/steel-images/small/steel/steel800/steel800-2.jpg

Note that I also removed some unnecessary str() calls, and for future reference if you do ever need to escape single or double quotes (say your string contains both), use a backslash like this:

'.com\',\'(.*?)\',\'Model Photo'

As mykhal pointed out in comments, this doesn't work very nicely with regex because you can no longer use the raw string ( r'...' ) literal. A better solution would be to use triple quoted strings as other answers suggested.

if you use double quotes (which have the same meaning as the single ones, in Python), you don't have to escape at all.. (in this case). you can even use string literal without the starting r (you don't have any backslash there)

re.compile(".com','(.*?)','Model Photo")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM