[英]Pattern within a pattern?
I want to capture Alta, Utah, USA
from asd Alta, Utah, USA qwe
. 我想从Alta, Utah, USA
asd Alta, Utah, USA qwe
捕获Alta, Utah, USA
asd Alta, Utah, USA qwe
。 Basically I'm trying to capture places from a text. 基本上,我试图从文本中捕获位置。 It won't be a perfect method, but the places must start with a capital and use a comma, followed by another word with a capital. 这不是一个完美的方法,但是场所必须以大写字母开头并使用逗号,然后再加上一个大写字母。
So far, I have wrote: 到目前为止,我已经写道:
\s[A-Z][a-z]+[,]?
I want to do multiple words, not just the first word, Alta
. 我想输入多个单词,而不仅仅是第一个单词Alta
。 This is my attempt to use square brackets inside other square brackets. 这是我尝试在其他方括号内使用方括号。
[\s[A-Z][a-z]+[,]?]+
But that doesn't work, so it must be syntactically incorrect. 但这不起作用,因此在语法上必须是错误的。
Updated as per OP's comment: 根据OP的评论进行了更新:
(?:\s*[A-Z][A-Za-z]+[,\s])+
Original Answer: 原始答案:
\b([A-Z][a-zA-Z]+),?
And you will get the names of the country in group 1 for each match 您将在每次比赛中获得第1组的国家名称
I think this is what you need: 我认为这是您需要的:
([A-Z][a-zA-Z]+)(,\s*([A-Z][a-zA-Z]+))*
Though the requirement pointed out by @Rizwan (in his comment) is still to be understood. 尽管@Rizwan指出的要求(在他的评论中)仍有待理解。
Just joining the party: 刚参加聚会:
import re
dirty = "asd Alta, Utah, USA qwe"
p = re.compile("([A-Z][a-zA-Z]+)")
re.findall(p,dirty)
output: 输出:
['Alta', 'Utah', 'USA']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.