I have a unicode string of the format
I have this unicode string :
unistr= [something::a.b.c][someotherthing::e.f.g]
I tried to write a regex that takes in only the strings before and after the "::"
delimiter. I tried testing this regex: ([\\w\\.]).+?(?=\\:\\:)
with my string in an online regex builder and it gave me out the desired result.
However when I wrapped it within this re.findall function, it doesn't give me the same result. it gives out [c,g] This is what I tried:
re.findall(r'([\w\.]).+?(?=\:\:)',unistr) #to get the string before "::"
re.findall(r'.+?([\w\.]\:\:)',unistr) # to get after "::"
What am I doing wrong?
I think you tested it wrong somehow. I modified it with this expression: ([\\w\\.])+
instead on Pythex and it captured two groups, someotherstring
and efg
, which is what I think you want, right?
I think you need to use finditer
with ([^\\[]*)\\:{2}([^\\]]*)
regex to get the ::
-delimited contents inside the square brackets:
import re
unistr = u'unistr= [something::a.b.c]'
print [[x.group(1), x.group(2)] for x in re.finditer(ur'([^\[]*)\:{2}([^\]]*)',unistr)]
Output of a sample program :
[[u'something', u'a.b.c']]
You can use the following :
import re
unistr= 'something::a.b.c'
print re.findall(r'^.+?(?=::)',unistr)
print re.findall(r'(?<=::).+?$',unistr)
Output:
['something']
['a.b.c']
Use this:
unistr= '[something::a.b.c][someotherthing::e.f.g]'
map(lambda v: v.split('::'), re.findall(r'\w+\:\:[\w\.]+', unistr))
Output:
Out[412]:
[['something', 'a.b.c'], ['someotherthing', 'e.f.g']]
I wouldn't complicate things, this will work:
re.findall(r'(\w+)::', unistr)
It matches word characters followed by ::
and captures it, returns a list containing all matches.
Note that :
is not a special character, shouldn't be escaped.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.