This is my code in django view (intentionally simplified)(Python 2.7):
# -*- coding: utf-8 -*-
from django.shortcuts import render
import re
def index(request):
found_verses = []
pattern = re.compile('ю')
with open('d.txt', 'r') as doc:
for line in doc:
found = pattern.search(line)
if found:
modified_line = pattern.sub('!'+'\g<0>'+'!',line)
found_verses.append(modified_line)
context = {'found_verses': found_verses}
return render(request, 'myapp/index.html', context)
d.txt
(also utf-8) contains this one line (intentionally simplified):
1. Я сказал Юлию одному.
The above, when rendered, gives me the expected result:
1. Я сказал Юли!ю! одному.
When I change to a capital letter pattern = re.compile('Ю')
, it also gives me the expected result:
1. Я сказал !Ю!лию одному.
But when I change to a group pattern = re.compile('[юЮ]')
or pattern = re.compile('[Юю]')
or pattern = re.compile('[ю]')
or pattern = re.compile('[Ю]')
, it gives me nothing. What I am trying to get is that:
1. Я сказал !Ю!ли!ю! одному.
Please help me to get this result. I've been struggling for more than a day and tried different configurations like pattern = re.compile('[юЮ]', re.UNICODE)
and pattern = re.compile('ю', re.UNICODE|re.I)
and this and countless others but all in vain.
with io.open('d.txt', 'r', encoding='utf-8') as doc:
...
...
pattern = re.compile(u'[юЮ]', re.UNICODE)
just a guess but try this
with open('d.txt', 'rb') as doc: #I guess you probably dont need the b flag for utf8 but meh
for line in doc:
line = line.decode("utf8")
...
The problem is probably that you are using regular strings, not unicode strings. The re
library needs to know how to treat the bytes in your RE. Try
re.compile(u'ю')
(Note that this is how @Ignacio does it in his answer).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.