[英]Regular expression for multiple occurances in python
I need to parse lines having multiple language codes as below 我需要解析具有多个语言代码的行,如下所示
008800002 Bruxelles-Nord$Br�ussel Nord$<deu>$Brussel Noord$<nld>
008800002
being a id 008800002
是一个id Bruxelles-Nord$Br ussel Nord$
being name1 Bruxelles-Nord$Br ussel Nord$
是name1 deu
being language one deu
是语言之一 $Brussel Noord$
being name two $Brussel Noord$
是名字二 nld
being language two. nld
是语言二。 SO, the idea is name and language can appear N number of times. 所以,这个想法是名称和语言可以出现N次。 I need to collect them all.
我需要全部收集它们。 the language in
<>
is 3 characters in length (fixed) and all names end with $
sign. <>
的语言长度为3个字符(固定),所有名称以$
符号结尾。
I tried this one but it is not giving expected output. 我试过这个,但它没有给出预期的输出。
x = re.compile('(?P<stop_id>\d{9})\s(?P<authority>[[\x00-\x7F]{3}|\s{3}])\s(?P<stop_name>.*)
(?P<lang_code>(?:[<]\S{0,4}))',flags=re.UNICODE)
I have no idea how to get repeated elements. 我不知道如何获得重复的元素。 It takes
它需要
Bruxelles-Nord$Br ussel Nord$<deu>$Brussel Noord$
as stop_name and <nld>
as language. Bruxelles-Nord$Br ussel Nord$<deu>$Brussel Noord$
as stop_name和<nld>
as language。
Do it in two steps. 分两步完成。 First separate ID from name/language pairs;
第一个单独的ID来自名称/语言对; then use
re.finditer
on the name/language section to iterate over the pairs and stuff them into a dict. 然后在名称/语言部分使用
re.finditer
迭代对,并将它们填入dict。
import re
line = u"008800002 Bruxelles-Nord$Br�ussel Nord$<deu>$Brussel Noord$<nld>"
m = re.search("(\d+)\s+(.*)", line, re.UNICODE)
id = m.group(1)
names = {}
for m in re.finditer("(.*?)<(.*?)>", m.group(2), re.UNICODE):
names[m.group(2)] = m.group(1)
print id, names
\b(\d+)\b\s*|(.*?)(?=<)<(.*?)>
Try this.Just grab the captures.see demo. 试试这个。只需抓住captures.see演示。
http://regex101.com/r/hS3dT7/4 http://regex101.com/r/hS3dT7/4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.