re.sub with Japanese Characters

Question

I have the following string:

s = u'アガサ・クリスティー　奥さまは名探偵　～パディントン発4時50分～（字幕版）'

However, when I try and get rid of the character （ and everything after it, it doesn't match:

>>> print re.sub(r'\（.+$', '', s)
アガサ・クリスティー　奥さまは名探偵　～パディントン発4時50分～（字幕版）

How would I get the string to be just:

アガサ・クリスティー　奥さまは名探偵　～パディントン発4時50分～

?

Answer 1

You should ensure that all of the parameters to re.sub() are the same type -- str or unicode . Try this:

# encoding: utf-8

import re
s = u'アガサ・クリスティー　奥さまは名探偵　～パディントン発4時50分～（字幕版）'
print re.sub(ur'\（.+$', u'', s)