级联字符串拆分，pythonic方式

Question

Take for example this format from IANA: http://www.iana.org/assignments/language-subtag-registry 以来自IANA的这种格式为例： http : //www.iana.org/assignments/language-subtag-registry

%%
Type: language
Subtag: aa
Description: Afar
Added: 2005-10-16
%%
Type: language
Subtag: ab
Description: Abkhazian
Added: 2005-10-16
Suppress-Script: Cyrl
%%
Type: language
Subtag: ae
Description: Avestan
Added: 2005-10-16
%%

Say I open the file: 说我打开文件：

import urllib
f = urllib.urlopen("http://www.iana.org/assignments/language-subtag-registry")
all=f.read()

Normally you would do like this 通常你会这样

lan=all.split("%%")

the iterate lan and split("\\n") then iterate the result and split(":"), is there a way to to this in python in one batch without the iteration and the output still be like this: [[["Type","language"],["Subtag", "ae"],...]...] ? 迭代lan和split("\\n")然后迭代结果和split（“：”），有一种方法可以在python中批量进行而无需迭代，并且输出仍然是这样的： [[["Type","language"],["Subtag", "ae"],...]...] ？

Answer 1

I don't see any sense in trying to do this in a single pass, if the elements you are getting to after each split are semantically diffent. 如果每次拆分后要访问的元素在语义上有所不同，那么尝试单步执行此操作就没有任何意义。

You could start by spliting by ":" -- that wold get you to the fine grained data - but what good would that be, if you wold not know were does this data belong? 您可以先以“：”分割-这将使您获得细粒度的数据-但是，如果您不知道该数据属于什么，那会有什么好处？

That said, you could put all the levels of separation inside a generator, and have it yield dictionary-objects with your data, ready for consunption: 就是说，您可以将所有级别的分离放置在生成器中，并使其与数据一起产生字典对象，以备考虑：

def iana_parse(data):
    for record in data.split("%%\n"):
        # skip empty records at file endings:
        if not record.strip():
            continue
        rec_data = {}
        for line in record.split("\n"):
            key, value = line.split(":")
            rec_data[key.strip()] = value.strip()
        yield rec_data

It can be done as a one liner as you request in the comments - but as I commented back, It could be written to fit as a single expression in one line. 可以按照您在注释中的要求将其作为一个衬纸来完成-但正如我评论说的那样，可以将其编写为适合单个表达式的一行。 It took more time to write than the example above, and would be nearly impossible to maintain. 比上面的示例花费更多的时间来编写，并且几乎不可能维护。 The code in the example above unfolds the logic in a few lines of code, that are placed "out of the way" - ie not inline where you are deaing witht he actual data, providing readability and maintainability for both tasks. 上面示例中的代码将逻辑放到几行代码中，这些代码“放在一边”（即不与您实际数据一起使用的地方是内联的），为这两项任务提供了可读性和可维护性。

That said, parsing as a structure of nested lists as you want can be done thus: 也就是说，可以按照需要将其解析为嵌套列表的结构：

structure = [[[token.strip() for token in line.split(":")] for line in record.split("\n") ] for record in data.split("%%") if record.strip() ]

Answer 2

As a single comprehension: 作为一个单一的理解：

raw = """\
%%
Type: language
Subtag: aa
Description: Afar
Added: 2005-10-16
%%
Type: language
Subtag: ab
Description: Abkhazian
Added: 2005-10-16
Suppress-Script: Cyrl
%%
Type: language
Subtag: ae
Description: Avestan
Added: 2005-10-16
%%"""


data = [
     dict(
         row.split(': ')
         for row in item_str.split("\n")
         if row  # required to avoid the empty lines which contained '%%'
     )
     for item_str in raw.split("%%") 
     if item_str  # required to avoid the empty items at the start and end
]

>>> data[0]['Added']
'2005-10-16'

Answer 3

Regexes , but I don't see the point: 正则表达式，但我不明白这一点：

re.split('%%|:|\\n', string)

Here multiple patterns were chained using the or | 在这里，多个模式使用或|链接| operator. 操作员。

Answer 4

You can use itertools.groupby : 您可以使用itertools.groupby ：

ss = """%%
Type: language
Subtag: aa
Description: Afar
Added: 2005-10-16
%%
Type: language
Subtag: ab
Description: Abkhazian
Added: 2005-10-16
Suppress-Script: Cyrl
%%
Type: language
Subtag: ae
Description: Avestan
Added: 2005-10-16
"""
sss = ss.splitlines(True) #List which looks like you're iterating over a file object


import itertools

output = []
for k,v in itertools.groupby(sss,lambda x: x.strip() == '%%'):
    if(k):  #Hit a '%%' record.  Need a new group.
        print "\nNew group:\n"
        current = {}
        output.append(current)
    else:   #just a regular record, write the data to our current record dict.
        for line in v:
            print line.strip()
            key,value = line.split(None,1)
            current[key] = value

One benefit of this answer is that it doesn't require you to read the entire file. 此答案的一个好处是它不需要您读取整个文件。 The whole expression is evaluated lazily. 整个表达式是惰性计算的。

级联字符串拆分，pythonic方式

问题描述

4 个解决方案

解决方案1
3 2012-09-17 13:50:36

解决方案2
3 已采纳 2013-09-10 10:17:22

解决方案3
2 2012-09-17 13:55:15

解决方案4
2 2012-09-17 13:59:48

级联字符串拆分，pythonic方式

问题描述

4 个解决方案

解决方案1 3 2012-09-17 13:50:36

解决方案2 3 已采纳 2013-09-10 10:17:22

解决方案3 2 2012-09-17 13:55:15

解决方案4 2 2012-09-17 13:59:48

解决方案1
3 2012-09-17 13:50:36

解决方案2
3 已采纳 2013-09-10 10:17:22

解决方案3
2 2012-09-17 13:55:15

解决方案4
2 2012-09-17 13:59:48