简体   繁体   English

从列表中创建元组或新列表

[英]make tuples or new list from a list

I have a list which comes from a text file that I have parsed using very primitive regular expressions. 我有一个列表来自一个使用非常原始的正则表达式解析过的文本文件。 I would like to reorganize a more spartan list that contains only files with a date immediately following. 我想重新组织一个更斯巴达的列表,该列表仅包含紧随其后的日期的文件。 I've tried looping through the list using len() but that will only extract the files and not the next entry. 我尝试使用len()遍历列表,但这只会提取文件,而不提取下一个条目。 Many thanks in advance. 提前谢谢了。

This: 这个:

2014-01-28

part002.csv.gz

2014-01-28

part001.csv.gz

2014-01-28

2014-01-28

2014-01-27

2014-01-27

2014-01-26

2014-01-26

2014-01-25

part002.csv.gz

2014-01-25

Becomes this: 变成这个:

part002.csv.gz

2014-01-28

part001.csv.gz

2014-01-28

part002.csv.gz

2014-01-25

You can use a list comprehension: 您可以使用列表理解:

filtered = [e for i, e in enumerate(l) if not isDate(e) or (i > 0 and not isDate(l[i-1]))]

Complete example: 完整的例子:

l = ['2014-01-28', 'part002.csv.gz', '2014-01-28', 'part001.csv.gz', '2014-01-28', '2014-01-28', '2014-01-27', 'part002.csv.gz', '2014-01-25']

def isDate (s):
    return '.' not in s

filtered = [e for i, e in enumerate(l) if not isDate(e) or (i > 0 and not isDate(l[i-1]))]

print (filtered)

Explained: 解释:

l is our original list. l是我们的原始清单。

isDate takes a string and tests whether it is a date (in my simple example it just checks that it doesn't contain a period, for better results use regex or strptime). isDate接受一个字符串并测试它是否是一个日期(在我的简单示例中,它只检查它是否不包含句点,使用regex或strptime可以得到更好的结果)。

enumerate enumerates a list (or anything iterable, I will now stick to the word list , just in order not to get too technical). enumerate枚举列表(或任何可迭代的列表,我现在会坚持使用单词list ,只是为了避免过于技术化)。 It returns a list of tuples; 它返回一个元组列表; each tuple containing the index and the element of the list passed to enumerate. 每个包含索引和传递给枚举的列表元素的元组。 For instance enumerate (['a', None, 3]) makes [(0,'a'),(1,None),(2,3)] 例如enumerate (['a', None, 3])使[(0,'a'),(1,None),(2,3)]

i, e = unpacks the tuple, assigning the index to i and the element to e . i, e =解压缩元组,将索引分配给i ,将元素分配给e

A list comprehension works like this (simplyfied): [x for x in somewhere if cond(x)] returns a list of all elements of somewhere which comply with the condition cond(x) . 列表理解是这样的(simplyfied): [x for x in somewhere if cond(x)]返回的所有元素的列表somewhere ,其符合条件cond(x)

In our case we only add elements to our filtered list, if they are no dates (not the fruit) not isDate(e) or if they are not at the beginning i > 0 and at the same time their predecessor is not a date not isDate(l[i-1]) (that is, a file). 在我们的情况下,如果元素不是日期(不是水果) not isDate(e)或它们不是以i > 0开头并且同时其前身不是日期not isDate(l[i-1]) ,则仅将元素添加到过滤列表中not isDate(l[i-1]) (即文件)。

In pseudocode: 用伪代码:

Take list `l`
Let our filtered list be an empty list
For each item in `l` do
    let `i` be the index of the item
    let `e` be the item itself

    if `e` is not a Date
      or if `i` > 0 (i.e. it is not the first item)
      and at the sametime the preceding item is a File
      then and only then add `e` to our filtered list.

Store the previous line at each line, then you always have have it when you need it 将前一行存储在每行中,然后在需要时总是拥有它

previous_line = None
newlist = []
for line in lines:
    if isdate(line):
        newlist.append(previous_line)
    previous_line = line

Defining isdate : 定义isdate

import datetime
def isdate(s):
    try:
        datetime.datetime.strptime(s, '%Y-%m-%d')
    except:
        return False
    else:
        return True

Working through it: 工作:

s = """
#that long string, snipped
"""

li = [x for x in s.splitlines() if x]

li
Out[3]: 
['2014-01-28',
 'part002.csv.gz',
 '2014-01-28',
 'part001.csv.gz',
 '2014-01-28',
 '2014-01-28',
 '2014-01-27',
 '2014-01-27',
 '2014-01-26',
 '2014-01-26',
 '2014-01-25',
 'part002.csv.gz',
 '2014-01-25']

[tup for tup in zip(li,li[1:]) if 'csv' in tup[0]] #shown for dicactic purposes, gen expression used below
Out[7]: 
[('part002.csv.gz', '2014-01-28'),
 ('part001.csv.gz', '2014-01-28'),
 ('part002.csv.gz', '2014-01-25')]

The actual answer: 实际答案:

from itertools import chain

list(chain.from_iterable(tup for tup in zip(li,li[1:]) if 'csv' in tup[0]))
Out[9]: 
['part002.csv.gz',
 '2014-01-28',
 'part001.csv.gz',
 '2014-01-28',
 'part002.csv.gz',
 '2014-01-25']

Essentially: zip (in python 2, use izip ) the list together with itself, one index advanced. 本质上:将列表与zip (在python 2中,使用izip )一起zip ,一个高级索引。 Iterate over the pairwise tuples, filtering out those that don't have a file-like string for their first element. 遍历成对的元组,过滤掉那些没有第一个元素像文件一样的字符串的元组。 Lastly, flatten the tuples into a list using itertools.chain to achieve your desired output. 最后,使用itertools.chain将元组展平为一个列表,以实现所需的输出。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM