简体   繁体   中英

Regex doesnt work with re.findall

I have a unicode string of the format

I have this unicode string :

unistr= [something::a.b.c][someotherthing::e.f.g]

I tried to write a regex that takes in only the strings before and after the "::" delimiter. I tried testing this regex: ([\\w\\.]).+?(?=\\:\\:) with my string in an online regex builder and it gave me out the desired result.

However when I wrapped it within this re.findall function, it doesn't give me the same result. it gives out [c,g] This is what I tried:

re.findall(r'([\w\.]).+?(?=\:\:)',unistr) #to get the string before "::"
re.findall(r'.+?([\w\.]\:\:)',unistr) # to get after "::"

What am I doing wrong?

I think you tested it wrong somehow. I modified it with this expression: ([\\w\\.])+ instead on Pythex and it captured two groups, someotherstring and efg , which is what I think you want, right?

I think you need to use finditer with ([^\\[]*)\\:{2}([^\\]]*) regex to get the :: -delimited contents inside the square brackets:

import re
unistr = u'unistr= [something::a.b.c]'
print [[x.group(1), x.group(2)] for x in re.finditer(ur'([^\[]*)\:{2}([^\]]*)',unistr)]

Output of a sample program :

[[u'something', u'a.b.c']] 

You can use the following :

import re
unistr= 'something::a.b.c'
print re.findall(r'^.+?(?=::)',unistr)
print re.findall(r'(?<=::).+?$',unistr)

Output:

['something']                                                                
['a.b.c']  

Use this:

unistr= '[something::a.b.c][someotherthing::e.f.g]'
map(lambda v: v.split('::'), re.findall(r'\w+\:\:[\w\.]+', unistr))

Output:

Out[412]:
[['something', 'a.b.c'], ['someotherthing', 'e.f.g']]

I wouldn't complicate things, this will work:

re.findall(r'(\w+)::', unistr)

It matches word characters followed by :: and captures it, returns a list containing all matches.

Note that : is not a special character, shouldn't be escaped.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM