[英]What is the most pythonic way to sort dates sequences?
我有一份表示一年中一個月的蜇(無法排序而不是連續): ['1/2013', '7/2013', '2/2013', '3/2013', '4/2014', '12/2013', '10/2013', '11/2013', '1/2014', '2/2014']
我正在尋找一種Pythonic方法來對所有這些方法進行排序並將每個連續序列分開,如下所示:
[ ['1/2013', '2/2013', '3/2013', '4/2013'],
['7/2013'],
['10/2013', '11/2013', '12/2013', '1/2014', '2/2014']
]
有任何想法嗎?
基於文檔中的示例 , 該示例顯示了如何使用itertools.groupby()
查找連續數字的運行 :
from itertools import groupby
from pprint import pprint
def month_number(date):
month, year = date.split('/')
return int(year) * 12 + int(month)
L = [[date for _, date in run]
for _, run in groupby(enumerate(sorted(months, key=month_number)),
key=lambda (i, date): (i - month_number(date)))]
pprint(L)
解決方案的關鍵是使用enumerate()
生成的范圍進行差分,以便連續幾個月都出現在同一組(運行)中。
[['1/2013', '2/2013', '3/2013'],
['7/2013'],
['10/2013', '11/2013', '12/2013', '1/2014', '2/2014'],
['4/2014']]
groupby的例子很可愛,但是過於密集並會打破這個輸入: ['1/2013', '2/2017']
,即非相鄰年份的相鄰月份。
from datetime import datetime
from dateutil.relativedelta import relativedelta
def areAdjacent(old, new):
return old + relativedelta(months=1) == new
def parseDate(s):
return datetime.strptime(s, '%m/%Y')
def generateGroups(seq):
group = []
last = None
for (current, formatted) in sorted((parseDate(s), s) for s in seq):
if group and last is not None and not areAdjacent(last, current):
yield group
group = []
group.append(formatted)
last = current
if group:
yield group
結果:
[['1/2013', '2/2013', '3/2013'],
['7/2013'],
['10/2013', '11/2013', '12/2013', '1/2014', '2/2014'],
['4/2014']]
如果你只想對列表進行排序,那么使用sorted函數並傳遞key
=將日期字符串轉換為Python的datetime
對象的函數,如lambda d: datetime.strptime(d, '%m/%Y')
,檢查以下代碼您的列表為L
示例:
>>> from datetime import datetime
>>> sorted(L, key = lambda d: datetime.strptime(d, '%m/%Y'))
['1/2013', '2/2013', '3/2013', '7/2013', '10/2013',
'11/2013', '12/2013', '1/2014', '2/2014', '4/2014'] # indented by hand
要將“月/年字符串列表”拆分為“連續月份列表”,可以使用以下腳本(讀取注釋),其中,首先我對列表L
進行排序,然后根據連續月份對字符串進行分組(至連續檢查我寫了一個函數):
def is_cm(d1, d2):
""" is consecutive month pair?
: Assumption d1 is older day's date than d2
"""
d1 = datetime.strptime(d1, '%m/%Y')
d2 = datetime.strptime(d2, '%m/%Y')
y1, y2 = d1.year, d2.year
m1, m2 = d1.month, d2.month
if y1 == y2: # if years are same d2 should be in next month
return (m2 - m1) == 1
elif (y2 - y1) == 1: # if years are consecutive
return (m1 == 12 and m2 == 1)
它的工作原理如下:
>>> is_cm('1/2012', '2/2012')
True # yes, consecutive
>>> is_cm('12/2012', '1/2013')
True # yes, consecutive
>>> is_cm('1/2015', '12/2012') # None --> # not consecutive
>>> is_cm('12/2012', '2/2013')
False # not consecutive
用於拆分代碼的代碼:
def result(dl):
"""
dl: dates list - a iterator of 'month/year' strings
type: list of strings
returns: list of lists of strings
"""
#Sort list:
s_dl = sorted(dl, key=lambda d: datetime.strptime(d, '%m/%Y'))
r_dl = [] # list to be return
# split list into list of lists
t_dl = [s_dl[0]] # temp list
for d in s_dl[1:]:
if not is_cm(t_dl[-1], d): # check if months are not consecutive
r_dl.append(t_dl)
t_dl = [d]
else:
t_dl.append(d)
return r_dl
result(L)
不要忘記包括from datetime import datetime
,這個技巧我相信你可以輕松更新日期為其他格式的新日期列表。
在這種特定情況下(不是很多元素)的簡單解決方案就是迭代所有月份:
year = dates[0].split('/')[1]
result = []
current = []
for i in range(1, 13):
x = "%i/%s" % (i, year)
if x in dates:
current.append(x)
if len(current) == 1:
result.append(current)
else:
current = []
好吧,這里有一個沒有itertools的東西,只要我能做到它而不會損害可讀性。 訣竅是使用zip
。 這基本上是@ moe的答案解開了一下。
def parseAsPair(piece):
"""Transforms things like '7/2014' into (2014, 7) """
m, y = piece.split('/')
return (int(y), int(m))
def goesAfter(earlier, later):
"""Returns True iff earlier goes right after later."""
earlier_y, earlier_m = earlier
later_y, later_m = later
if earlier_y == later_y: # same year?
return later_m == earlier_m + 1 # next month
else: # next year? must be Dec -> Jan
return later_y == earlier_y + 1 and earlier_m == 12 and later_m == 1
def groupSequentially(months):
result = [] # final result
if months:
sorted_months = sorted(months, key=parseAsPair)
span = [sorted_months[0]] # current span; has at least the first month
for earlier, later in zip(sorted_months, sorted_months[1:]):
if not goesAfter(parseAsPair(earlier), parseAsPair(later)):
# current span is over
result.append(span)
span = []
span.append(later)
# last span was not appended because sequence ended without breaking
result.append(span)
return result
試一試:
months =['1/2013', '7/2013', '2/2013', '3/2013', '4/2014', '12/2013',
'10/2013', '11/2013', '1/2014', '2/2014']
print groupSequentially(months) # output wrapped manually
[['1/2013', '2/2013', '3/2013'],
['7/2013'],
['10/2013', '11/2013', '12/2013', '1/2014', '2/2014'],
['4/2014']]
如果我們在最后的列表中映射parseAsPair
,我們可以節省一些性能和認知負載。 然后每次調用parseAsPair
都可以從groupSequentially
刪除,但我們必須再次將結果轉換為字符串。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.