用月和年填充字典列表中的缺失值

Question

我准備了可視化數據。 數據結構如下：

data = [{u'count': 1, u'_id': {u'year': 2010, u'month': 4}}, {u'count': 1, u'_id': {u'year': 2010, u'month': 5}}, {u'count': 2, u'_id': {u'year': 2010, u'month': 7}}, {u'count': 1, u'_id': {u'year': 2010, u'month': 9}}, {u'count': 1, u'_id': {u'year': 2010, u'month': 10}}, {u'count': 4, u'_id': {u'year': 2010, u'month': 12}}]

我通過以下方式將它們轉換為帶有時間戳和count變量的列表：

chart = []
for month in data:
      d = datetime.datetime.strptime(str(month['_id']['year'])+"-"+str(month['_id']['month']),'%Y-%m')
      dat = time.mktime(d.timetuple())
      chart.append([dat*1000,month['count']])

結果是這樣的（該示例與輸入數據的示例不匹配）

chart: [[1220216400000.0, 1], [1222808400000.0, 8], [1225490400000.0, 1], [1228082400000.0, 6], [1230760800000.0, 4], [1233439200000.0, 1], [1235858400000.0, 1], [1238533200000.0, 1], [1241125200000.0, 2], [1243803600000.0, 1], [1246395600000.0, 1], [1249074000000.0, 1]]

我正在嘗試做的是更改代碼以包括count = 0的第一個和最后一個日期之間缺少的月份。 例如，在數據中，從2010年的第5個月起，下一個字段是10年的第7個月。第6個月丟失了，我想將其包括為count = 0。

任何想法？

Answer 1

這是一種方法。

這個想法是讓字典dat > count 。 如果您不知道數據中將包含多少年，則需要在每次迭代中初始化每月數據：

import datetime
from pprint import pprint
import time

data = [{u'count': 1, u'_id': {u'year': 2010, u'month': 4}}, {u'count': 1, u'_id': {u'year': 2010, u'month': 5}},
        {u'count': 2, u'_id': {u'year': 2010, u'month': 7}}, {u'count': 1, u'_id': {u'year': 2010, u'month': 9}},
        {u'count': 1, u'_id': {u'year': 2010, u'month': 10}}, {u'count': 4, u'_id': {u'year': 2010, u'month': 12}}]

chart = {}
for month in data:
    year = month['_id']['year']
    for m in xrange(1, 12):
        d = datetime.datetime.strptime(str(year) + "-" + str(m), '%Y-%m')
        dat = time.mktime(d.timetuple()) * 1000
        if dat not in chart:
            chart[dat] = 0

    d = datetime.datetime.strptime(str(year) + "-" + str(month['_id']['month']), '%Y-%m')
    dat = time.mktime(d.timetuple()) * 1000
    chart[dat] = month['count']

pprint(sorted(chart.items()))

如果您知道數據中的年份，請在循環遍歷data之前初始化月份計數。

打印：

[(1262322000000.0, 0),
 (1265000400000.0, 0),
 (1267419600000.0, 0),
 (1270094400000.0, 1),
 (1272686400000.0, 1),
 (1275364800000.0, 0),
 (1277956800000.0, 2),
 (1280635200000.0, 0),
 (1283313600000.0, 1),
 (1285905600000.0, 1),
 (1288584000000.0, 0),
 (1291179600000.0, 4)]

看-缺少的月份數為0 。

希望能有所幫助。

Answer 2

這是一個使用dateutil庫的解決方案，每月對日期范圍進行迭代。

這個想法是用datetime作為鍵初始化一個OrderedDict ，並作為一個值count 。 然后，對於有序字典中的每個項目，每月在當前項目和先前添加的項目之間的日期范圍內進行迭代，並添加0計數：

from collections import OrderedDict
import datetime
from pprint import pprint
import time
from dateutil.rrule import rrule, MONTHLY


data = [{u'count': 1, u'_id': {u'year': 2010, u'month': 4}}, {u'count': 1, u'_id': {u'year': 2010, u'month': 5}},
        {u'count': 2, u'_id': {u'year': 2010, u'month': 7}}, {u'count': 1, u'_id': {u'year': 2010, u'month': 9}},
        {u'count': 1, u'_id': {u'year': 2010, u'month': 10}}, {u'count': 4, u'_id': {u'year': 2010, u'month': 12}}]

new_data = OrderedDict()
for item in data:
    year, month = item['_id']['year'], item['_id']['month']
    d = datetime.datetime.strptime(str(year) + "-" + str(month), '%Y-%m')
    new_data[d] = item['count']

chart = {}
last_added = None
for d, count in new_data.iteritems():
    date_start = last_added if last_added else d
    for dt in rrule(MONTHLY, dtstart=date_start, until=d):
        key = time.mktime(dt.timetuple()) * 1000
        if key not in chart:
            chart[key] = count if dt == d else 0
    last_added = d

pprint(sorted(chart.items()))

打印：

[(1270094400000.0, 1),
 (1272686400000.0, 1),
 (1275364800000.0, 0),
 (1277956800000.0, 2),
 (1280635200000.0, 0),
 (1283313600000.0, 1),
 (1285905600000.0, 1),
 (1288584000000.0, 0),
 (1291179600000.0, 4)]

希望對你有效。

Answer 3

我看到您的列表已排序，因此您只需要記住前一個日期（最初設置為1），並在列表中填充缺少的元素（例如，如果month['_id']['month']和上一個日期大於1）。

用月和年填充字典列表中的缺失值

問題描述

3 個解決方案

解決方案1
1 2014-03-31 12:46:47

解決方案2
1 已采納 2014-03-31 14:27:57

解決方案3
0 2014-03-31 12:45:41

用月和年填充字典列表中的缺失值

問題描述

3 個解決方案

解決方案1 1 2014-03-31 12:46:47

解決方案2 1 已采納 2014-03-31 14:27:57

解決方案3 0 2014-03-31 12:45:41

解決方案1
1 2014-03-31 12:46:47

解決方案2
1 已采納 2014-03-31 14:27:57

解決方案3
0 2014-03-31 12:45:41