python中多次出现的正则表达式

Question

I need to parse lines having multiple language codes as below 我需要解析具有多个语言代码的行，如下所示

008800002     Bruxelles-Nord$Br�ussel Nord$<deu>$Brussel Noord$<nld>

008800002 being a id 008800002是一个id
Bruxelles-Nord$Br ussel Nord$ being name1 Bruxelles-Nord$Br ussel Nord$是name1
deu being language one deu是语言之一
$Brussel Noord$ being name two $Brussel Noord$ 是名字二
nld being language two. nld是语言二。

SO, the idea is name and language can appear N number of times. 所以，这个想法是名称和语言可以出现N次。 I need to collect them all. 我需要全部收集它们。 the language in <> is 3 characters in length (fixed) and all names end with $ sign. <>的语言长度为3个字符（固定），所有名称以$符号结尾。

I tried this one but it is not giving expected output. 我试过这个，但它没有给出预期的输出。

x = re.compile('(?P<stop_id>\d{9})\s(?P<authority>[[\x00-\x7F]{3}|\s{3}])\s(?P<stop_name>.*)
    (?P<lang_code>(?:[<]\S{0,4}))',flags=re.UNICODE)

I have no idea how to get repeated elements. 我不知道如何获得重复的元素。 It takes 它需要

Bruxelles-Nord$Br ussel Nord$<deu>$Brussel Noord$ as stop_name and <nld> as language. Bruxelles-Nord$Br ussel Nord$<deu>$Brussel Noord$ as stop_name和<nld> as language。

Answer 1

Do it in two steps. 分两步完成。 First separate ID from name/language pairs; 第一个单独的ID来自名称/语言对; then use re.finditer on the name/language section to iterate over the pairs and stuff them into a dict. 然后在名称/语言部分使用re.finditer迭代对，并将它们填入dict。

import re

line = u"008800002     Bruxelles-Nord$Br�ussel Nord$<deu>$Brussel Noord$<nld>"
m = re.search("(\d+)\s+(.*)", line, re.UNICODE)
id = m.group(1)
names = {}
for m in re.finditer("(.*?)<(.*?)>", m.group(2), re.UNICODE):
    names[m.group(2)] = m.group(1)
print id, names

Answer 2

\b(\d+)\b\s*|(.*?)(?=<)<(.*?)>

Try this.Just grab the captures.see demo. 试试这个。只需抓住captures.see演示。

http://regex101.com/r/hS3dT7/4 http://regex101.com/r/hS3dT7/4

python中多次出现的正则表达式

问题描述

2 个解决方案

解决方案1
3 已采纳 2014-10-01 09:31:47

解决方案2
2 2014-10-01 09:30:55

python中多次出现的正则表达式

问题描述

2 个解决方案

解决方案1 3 已采纳 2014-10-01 09:31:47

解决方案2 2 2014-10-01 09:30:55

解决方案1
3 已采纳 2014-10-01 09:31:47

解决方案2
2 2014-10-01 09:30:55