简体   繁体   English

匹配多个单词

[英]Matching more than one word

I've the following phrases and I'd like to match them: 我有以下短语,我想匹配它们:

"De la Sota: Hello" -> "De la Sota" “德拉索塔:您好”->“德拉索塔”

"Guini: Hello" -> "Guini" “ Guini:您好”->“ Guini”

"Prat Gay: Hello" -> "Prat Gay" “ Prat Gay:Hello”->“ Prat Gay”

I'm using r"(\\w+):" but it only matches the last word before : . 我正在使用r"(\\w+):"但它只匹配:之前的最后一个单词。

Simply use this pattern: 只需使用以下模式:

/^(.*):/gm

Now $1 is containing what you need. 现在$1包含了您所需要的。

Online Demo 在线演示

Noted that I'm pretty sure there is a better approach than regex for doing that. 注意,我很确定有比regex更好的方法。 But I'm not a python expert. 但我不是python专家。

str.split(":")[0] should work, where str is your string you'd like to split. str.split(":")[0]应该可以工作,其中str是您要分割的字符串。

>>> str = "De la Sota: Hello" 
>>> str.split(":")[0]
'De la Sota'

This works by splitting the string into a list, where the parameter is the delimiter. 通过将字符串拆分为一个列表进行工作,其中参数是定界符。 If you specify the colon as the delimiter, it will split the string into a list of individual phrases separated by the colon. 如果将冒号指定为定界符,它将把字符串分成由冒号分隔的单个短语列表。 The [0] just refers to the first value of the list, which is what you wanted. [0]只是指列表的第一个值,这就是您想要的。

change \\w+ to .+ or .*: 将\\ w +更改为。+或。*:

input_text = ''' De la Sota: Hello

Guini: Hello

Prat Gay: Hello'''

print(re.findall(r'(.+):',input_text)
"Prat Gay: Hello" -> "Prat Gay"

If that is exactly what you have you can use a negation set to get rid of : , 如果这正是您所拥有的,则可以使用否定集来消除: (using \\s -- or if it is a tab using \\t ) and Helo because it is a set. (使用\\s或如果使用\\t是选项卡)和Helo,因为它是一个集合。 As for names, some last name contain - , or 至于名称,某些姓氏包含- we need more than one occurrence of a character ( \\w ) to make a name: 我们需要多次出现一个字符( \\w )来命名:

import re
string = ''' De la Sota: Hello

Guini: Hello

Prat Gay: Hello
'''
print(re.findall(r'[-\w ]+[^:\sHelo]', string))

gives the following answer: 给出以下答案:

[' De la Sota', 'Guini', 'Prat Gay']

You should use re.findall not re.match because the former looks in the entire string and the latter only matches with the first line and see if the string starts with it. 您应该使用re.findall而不是re.match因为前者在整个字符串中查找,而后者仅与第一行匹配,并查看字符串是否以它开头。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM