简体   繁体   English

带有重音字符的正则表达式模式

[英]Regex pattern with accented characters

I am trying to get the words that start with a capital letter regardless of whether it has a special character or not in the word.我试图获取以大写字母开头的单词,无论单词中是否有特殊字符。 Currently, my pattern only gets capital letters without accents.目前,我的模式只得到没有重音的大写字母。

I don't need numbers or hyphens, just accents or special characters in the letters.我不需要数字或连字符,只需要字母中的重音符号或特殊字符。

pattern = r"\b[A-Z][a-z]*\b"
name = soup.select('h1.data-header__headline-wrapper')[0].text.strip()
name = re.findall(pattern, name)
name = " ".join(name)

Some examples.一些例子。 Special characters should be included to correctly return player 1 and 4.应包含特殊字符以正确返回玩家 1 和 4。

�lvaro Fern�ndez
[]

#3                    
                                            Rico Henry
['Rico', 'Henry']
Rico Henry
#24                    
                                            Tariqe Fosu
['Tariqe', 'Fosu']
Tariqe Fosu
#29                    
                                            Mads Bech S�rensen
['Mads', 'Bech']
Mads Bech

You need to pip install regex in your console and then use您需要pip install regex ,然后使用

import regex
pattern = r"\b\p{Lu}\p{Ll}*\b"
name = soup.select('h1.data-header__headline-wrapper')[0].text.strip()
name = regex.findall(pattern, name)
name = " ".join(name)

Here,这里,

  • \b - a word boundary \b - 单词边界
  • \p{Lu} - an uppercase letter \p{Lu} - 大写字母
  • \p{Ll}* - zero or more lowercase letters. \p{Ll}* - 零个或多个小写字母。

If you want to use the core "re" module in Python, one option is to add to the list all Unicode letters you expect which are not in the range AZ.如果您想使用 Python 中的核心“re”模块,一种选择是将您期望的不在 AZ 范围内的所有 Unicode 字母添加到列表中。 Also, add the re.UNICODE flag to findall() function to allow for UNICODE characters.此外,将re.UNICODE标志添加到 findall() function 以允许 UNICODE 字符。

For example:例如:

s = "Ébc Ábc Abc Cámara Corazón Señor"
name = re.findall(r"(\b[A-ZÀÁÉ]\S*\b)", s, re.UNICODE)
print(name)

Output: Output:

['Ébc', 'Ábc', 'Abc', 'Cámara', 'Corazón', 'Señor']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM