排序日期序列的最pythonic方法是什么？

Question

我有一份表示一年中一個月的蜇（無法排序而不是連續）： ['1/2013', '7/2013', '2/2013', '3/2013', '4/2014', '12/2013', '10/2013', '11/2013', '1/2014', '2/2014']

我正在尋找一種Pythonic方法來對所有這些方法進行排序並將每個連續序列分開，如下所示：

[ ['1/2013', '2/2013', '3/2013', '4/2013'], 
  ['7/2013'], 
  ['10/2013', '11/2013', '12/2013', '1/2014', '2/2014'] 
]

有任何想法嗎？

Answer 1

基於文檔中的示例，該示例顯示了如何使用itertools.groupby() 查找連續數字的運行：

from itertools import groupby
from pprint import pprint

def month_number(date):
    month, year = date.split('/')
    return int(year) * 12 + int(month)

L = [[date for _, date in run]
     for _, run in groupby(enumerate(sorted(months, key=month_number)),
                           key=lambda (i, date): (i - month_number(date)))]
pprint(L)

解決方案的關鍵是使用enumerate()生成的范圍進行差分，以便連續幾個月都出現在同一組（運行）中。

產量

[['1/2013', '2/2013', '3/2013'],
 ['7/2013'],
 ['10/2013', '11/2013', '12/2013', '1/2014', '2/2014'],
 ['4/2014']]

Answer 2

groupby的例子很可愛，但是過於密集並會打破這個輸入： ['1/2013', '2/2017'] ，即非相鄰年份的相鄰月份。

from datetime import datetime
from dateutil.relativedelta import relativedelta

def areAdjacent(old, new):
    return old + relativedelta(months=1) == new

def parseDate(s):
    return datetime.strptime(s, '%m/%Y')

def generateGroups(seq):
    group = []
    last = None
    for (current, formatted) in sorted((parseDate(s), s) for s in seq):
        if group and last is not None and not areAdjacent(last, current):
            yield group
            group = []
        group.append(formatted)
        last = current
    if group:
        yield group

結果：

[['1/2013', '2/2013', '3/2013'], 
 ['7/2013'],
 ['10/2013', '11/2013', '12/2013', '1/2014', '2/2014'],
 ['4/2014']]

Answer 3

如果你只想對列表進行排序，那么使用sorted函數並傳遞key =將日期字符串轉換為Python的datetime對象的函數，如lambda d: datetime.strptime(d, '%m/%Y') ，檢查以下代碼您的列表為L示例：

>>> from datetime import datetime
>>> sorted(L, key = lambda d: datetime.strptime(d, '%m/%Y'))
['1/2013', '2/2013', '3/2013', '7/2013', '10/2013', 
 '11/2013', '12/2013', '1/2014', '2/2014', '4/2014'] # indented by hand

要將“月/年字符串列表”拆分為“連續月份列表”，可以使用以下腳本（讀取注釋），其中，首先我對列表L進行排序，然后根據連續月份對字符串進行分組（至連續檢查我寫了一個函數）：

def is_cm(d1, d2):
    """ is consecutive month pair?
        : Assumption d1 is older day's date than d2
    """
    d1 = datetime.strptime(d1, '%m/%Y')
    d2 = datetime.strptime(d2, '%m/%Y') 

    y1, y2 = d1.year, d2.year
    m1, m2 = d1.month, d2.month

    if y1 == y2: # if years are same d2 should be in next month
        return (m2 - m1) == 1
    elif (y2 - y1) == 1: # if years are consecutive
        return (m1 == 12 and m2 == 1)

它的工作原理如下：

>>> is_cm('1/2012', '2/2012')
True # yes, consecutive
>>> is_cm('12/2012', '1/2013')
True # yes, consecutive
>>> is_cm('1/2015', '12/2012') # None --> # not consecutive
>>> is_cm('12/2012', '2/2013')
False # not consecutive

用於拆分代碼的代碼：

def result(dl):
    """
    dl: dates list - a iterator of 'month/year' strings
    type: list of strings

    returns: list of lists of strings
    """
    #Sort list:
    s_dl = sorted(dl, key=lambda d: datetime.strptime(d, '%m/%Y'))
    r_dl = [] # list to be return
    # split list into list of lists
    t_dl = [s_dl[0]] # temp list
    for d in s_dl[1:]:
        if not is_cm(t_dl[-1], d): # check if months are not consecutive
            r_dl.append(t_dl)
            t_dl = [d]
        else:
            t_dl.append(d)
    return r_dl

result(L)

不要忘記包括from datetime import datetime ，這個技巧我相信你可以輕松更新日期為其他格式的新日期列表。

在@ 9000提示之后我可以簡化我的排序函數並刪除舊答案，如果你想檢查舊腳本檢查@codepad 。

Answer 4

在這種特定情況下（不是很多元素）的簡單解決方案就是迭代所有月份：

year = dates[0].split('/')[1]
result = []
current = []
for i in range(1, 13):
    x = "%i/%s" % (i, year)
    if x in dates:
        current.append(x)
        if len(current) == 1:
            result.append(current)
    else:
        current = []

Answer 5

好吧，這里有一個沒有itertools的東西，只要我能做到它而不會損害可讀性。 訣竅是使用zip 。 這基本上是@ moe的答案解開了一下。

def parseAsPair(piece):
  """Transforms things like '7/2014' into (2014, 7) """
  m, y = piece.split('/')
  return (int(y), int(m))

def goesAfter(earlier, later):
  """Returns True iff earlier goes right after later."""
  earlier_y, earlier_m = earlier
  later_y, later_m = later
  if earlier_y == later_y:  # same year?
    return later_m == earlier_m + 1 # next month
  else: # next year? must be Dec -> Jan
    return later_y == earlier_y + 1 and earlier_m == 12 and later_m == 1

def groupSequentially(months):
  result = []  # final result
  if months:
    sorted_months = sorted(months, key=parseAsPair)
    span = [sorted_months[0]]  # current span; has at least the first month
    for earlier, later in zip(sorted_months, sorted_months[1:]):
      if not goesAfter(parseAsPair(earlier), parseAsPair(later)):
        # current span is over
        result.append(span)
        span = []
      span.append(later)
    # last span was not appended because sequence ended without breaking
    result.append(span)
  return result

試一試：

months =['1/2013', '7/2013', '2/2013', '3/2013', '4/2014', '12/2013',
         '10/2013', '11/2013', '1/2014', '2/2014']

print groupSequentially(months)  # output wrapped manually

[['1/2013', '2/2013', '3/2013'], 
 ['7/2013'], 
 ['10/2013', '11/2013', '12/2013', '1/2014', '2/2014'], 
 ['4/2014']]

如果我們在最后的列表中映射parseAsPair ，我們可以節省一些性能和認知負載。 然后每次調用parseAsPair都可以從groupSequentially刪除，但我們必須再次將結果轉換為字符串。

排序日期序列的最pythonic方法是什么？

問題描述

5 個解決方案

解決方案1
4 已采納 2014-04-15 06:21:46

產量

解決方案2
2 2014-04-15 06:22:03

解決方案3
1 2014-04-15 06:21:07

解決方案4
0 2014-04-15 05:52:37

解決方案5
0 2014-04-15 16:54:18

排序日期序列的最pythonic方法是什么？

問題描述

5 個解決方案

解決方案1 4 已采納 2014-04-15 06:21:46

產量

解決方案2 2 2014-04-15 06:22:03

解決方案3 1 2014-04-15 06:21:07

解決方案4 0 2014-04-15 05:52:37

解決方案5 0 2014-04-15 16:54:18

解決方案1
4 已采納 2014-04-15 06:21:46

解決方案2
2 2014-04-15 06:22:03

解決方案3
1 2014-04-15 06:21:07

解決方案4
0 2014-04-15 05:52:37

解決方案5
0 2014-04-15 16:54:18