Python - 将 CSV 列舍入到最接近的 30 分钟

Question

My CSV data is the following:我的 CSV 数据如下：

Columns:列：

CRASH_MONTH (eg "1") CRASH_MONTH（例如“1”）
CRASH_DAY (eg "1") CRASH_DAY（例如“1”）
TIMESTR (eg "8:40") TIMESTR（例如“8:40”）

Wished result:希望的结果：

A new column named "CRASH_DATETIME" with a datetime Python object based with the corresponding date.一个新列名为“CRASH_DATETIME”有datetime基于与相应的日期Python对象。 Year doesn't matter, main goal is to track crashes by month, day and hour:minutes, which should be rounded to the nearest 30min.年份无关紧要，主要目标是按月、日和小时：分钟跟踪崩溃，应四舍五入到最接近的 30 分钟。

Tried the following but failed:尝试了以下但失败了：

from datetime import datetime, timedelta

def ceil_dt(month, day, hourWithMinutes, delta):
   hour,minutes = hourWithMinutes.split(':')
   int(month)
   int(day)
   int(hour)
   int(minutes)

   dt = datetime.datetime(month=month, day=day, hour=hour, minute=minutes)
   return dt + (datetime.min - dt) % delta

and和

dataInitial['TIME'] = dataInitial.apply(lambda row: ceil_dt(row['CRASH_MONTH'], row['CRASH_DAY'], row['TIMESTR'], '30'))

But failed ( using Jupyter Notebook ):但失败了（使用 Jupyter Notebook ）：

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5126)()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item (pandas/_libs/hashtable.c:14010)()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-40-a9ef29fd7eb7> in <module>()
----> 1 dataInitial['TIME'] = dataInitial.apply(lambda row: ceil_dt(row['CRASH_MONTH'], row['CRASH_DAY'], row['TIMESTR'], '30'))

~/anaconda2/envs/tfdeeplearning/lib/python3.5/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
   4260                         f, axis,
   4261                         reduce=reduce,
-> 4262                         ignore_failures=ignore_failures)
   4263             else:
   4264                 return self._apply_broadcast(f, axis)

~/anaconda2/envs/tfdeeplearning/lib/python3.5/site-packages/pandas/core/frame.py in _apply_standard(self, func, axis, ignore_failures, reduce)
   4356             try:
   4357                 for i, v in enumerate(series_gen):
-> 4358                     results[i] = func(v)
   4359                     keys.append(v.name)
   4360             except Exception as e:

<ipython-input-40-a9ef29fd7eb7> in <lambda>(row)
----> 1 dataInitial['TIME'] = dataInitial.apply(lambda row: ceil_dt(row['CRASH_MONTH'], row['CRASH_DAY'], row['TIMESTR'], '30'))

~/anaconda2/envs/tfdeeplearning/lib/python3.5/site-packages/pandas/core/series.py in __getitem__(self, key)
    599         key = com._apply_if_callable(key, self)
    600         try:
--> 601             result = self.index.get_value(self, key)
    602 
    603             if not is_scalar(result):

~/anaconda2/envs/tfdeeplearning/lib/python3.5/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
   2475         try:
   2476             return self._engine.get_value(s, k,
-> 2477                                           tz=getattr(series.dtype, 'tz', None))
   2478         except KeyError as e1:
   2479             if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value (pandas/_libs/index.c:4404)()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value (pandas/_libs/index.c:4087)()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5210)()

KeyError: ('CRASH_MONTH', 'occurred at index CRASH_DATE')

Any ideas?有任何想法吗？

Answer 1

Your function has some minor problems regarding the conversions (not stored in the variable), the lack of the year and the timedelta.您的函数在转换（未存储在变量中）、缺少年份和 timedelta 方面存在一些小问题。 This version of the function works properly:此版本的功能正常工作：

from datetime import datetime, timedelta

def ceil_dt(month, day, hourWithMinutes, delta):
    hour,minutes = hourWithMinutes.split(':')
    month = int(month)
    day = int(day)
    hour = int(hour)
    minutes = int(minutes)

    dt = datetime(year = 2019, month=month, day=day, hour=int(hour), minute=int(minutes))

    return dt + (datetime.min - dt) % timedelta(minutes=int(delta))

Python - 将 CSV 列舍入到最接近的 30 分钟

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-02-24 19:31:59

Python - 将 CSV 列舍入到最接近的 30 分钟

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-02-24 19:31:59

解决方案1
1 已采纳 2019-02-24 19:31:59