Python字母重復替換Unicode字符串

Question

我需要在字符串中替換兩個輸入錯誤的字母，例如“bbig”。 但它只適用於拉丁字母，而不適用於西里爾字母。 我在Centos Linux下使用Python 2.6.6版。

#!/usr/bin/python
# -*- coding: utf-8 -*-
import re
def reg(item):
  item = re.sub(r'([A-ZА-ЯЁЄЇІ])\1', r'\1', item, re.U)
  #this work only with latin too
  #item = re.sub(r'(.)\1', r'\1', item, re.U)
  return item

print reg('ББООЛЛЬЬШШООЙЙ')
print reg('BBIIGG')

上面的代碼返回：

ББООЛЛЬЬШШООЙЙ
大

我做錯了什么？ 謝謝你的幫助。

Answer 1

您正在使用字節字符串。 這使得您使用的所有內容都匹配並替換字節。 如果你想匹配和替換字母，這將不起作用。

改為使用unicode字符串：

#!/usr/bin/python
# -*- coding: utf-8 -*-
import re
def reg(item):
  item = re.sub(ur'([A-ZА-ЯЁЄЇІ])\1', r'\1', item, re.U)
  #this work only with latin too
  #item = re.sub(r'(.)\1', r'\1', item, re.U)
  return item

print reg(u'ББООЛЛЬЬШШООЙЙ')
print reg(u'BBIIGG')

請注意，這適用於預先組合的字符，但使用組合標記組成的字符將會平滑。

如果用戶試圖輸入這個句子（提示：檢查它的第二個單詞），那也將是災難性的。

Python字母重復替換Unicode字符串

問題描述

1 個解決方案

解決方案1
2 已采納 2013-05-24 13:26:15

Python字母重復替換Unicode字符串

問題描述

1 個解決方案

解決方案1 2 已采納 2013-05-24 13:26:15

解決方案1
2 已采納 2013-05-24 13:26:15