如何在Python中使用正則表達式匹配重音字符？

Question

除了Python，我需要這個問題的解決方案！ 我已經嘗試為Python安裝regex庫，因為顯然可以在Python的正則表達式中使用POSIX表達式，但是我猜它在[:alpha:]類中不包含Unicode字符。 例如：

>>> re.search(r'[[:alpha:] ]+','Please work blåbær and NOW stop 123').group(0)
'Please work bl'

當我想要它匹配Please work blåbær and NOW stop

編輯：我使用的是Python 2.7

編輯2：我嘗試了以下內容：

>>> re.search(re.compile('[\w ]+', re.UNICODE),'Please work blåbær and NOW stop 123').group(0)
'Please work bl\xc3'

不是我想要的（我想在第一個非ASCII字符之后匹配部分），但至少它在字符上比以前更匹配。 我應該在這做什么才能讓它與我想要的其他東西相匹配？

編輯3：我不想匹配任何非“單詞”字符; “word”是指az，AZ，space和單詞字符的任何重音變體。 我希望我得到了我的想法; 在一個短語中

lets match força, but stop before that comma

我想匹配只lets match força

編輯4：所以我嘗試將Python 3用於這個腳本：

>>> re.search(re.compile('[\w ]+', re.UNICODE),'lets match força, but stop before that comma').group(0)
'lets match força'

我猜它在Python 3中大部分都有用，除了它還匹配數字（我絕對不想要）和下划線。 有什么方法可以解決這個問題，在Python 2 或 3中？

Answer 1

目前還不清楚你使用的是哪個python版本。 如果您使用2.x，那么您可能會遇到unicode問題。 請參閱此帖子以獲取更多指示，並隨時更新您的問題以進一步詳細說明。

我很驚訝，我無法將重音字符轉換為正確的unicode表示...

但有解決方法：

re.search(re.compile('((\w+\s)|(\w+\W+\w+\s))+', re.UNICODE), ur'Please work blåbær and NOW stop 123').group(0)

要么

re.search(re.compile('\D+', re.UNICODE), ur'Please work blåbær and NOW stop 123').group(0)