Python Pandas Groupby删除DateTime列

Question

I am having some trouble using groupby.median() and groupby.mean() on a DataFrame containing intermittent NaT values. 我在包含间歇性NaT值的DataFrame上使用groupby.median（）和groupby.mean（）遇到麻烦。 Specifically, I have several columns in a dataset calculating various time differences based on other columns. 具体来说，我在数据集中有几列，根据其他列计算各种时差。 In some instances, no time difference exists, causing a NaT value similar to the example below: 在某些情况下，不存在时间差，导致NaT值类似于以下示例：

Group    Category    Start Time      End Time      Time Diff
  A         1        08:00:00.000    08:00:00.500      .500
  B         1        09:00:00.000    09:02:00.000  2:00.000
  B         1        09:00:00.000      NaT           NaT
  A         2        09:00:00.000    09:02:00.000  2:00.000
  A         2        09:00:00.000    09:01:00.000  1:00.000
  A         2        08:00:00.000    08:00:01.500     1.500

Any time I run df.groupby(['Group', 'Category'].median() or .mean() any column that contains NaT is dropped from the result set. I've attempted a fillna but NaT's seemed to remain. As an added point of context, this script worked correctly in an older version of Anaconda Python (1.x). I was recently able to upgrade my work computer to 2.0.1 at which point this issue began creeping up. 每当我运行df.groupby(['Group', 'Category'].median()或.mean() ，包含NaT的任何列都将从结果集中删除。我尝试了fillna但NaT似乎仍然存在。另外，此脚本可在旧版本的Anaconda Python（1.x）中正常工作，最近我能够将工作计算机升级到2.0.1，此问题开始蔓延。

EDIT: I will leave my thoughts about NaT's up above in the event that they are a factor, but upon further review it seems that my problem actually lies in the fact that these columns are timedelta64s. 编辑：如果它们是一个因素，我将不去考虑NaT的问题，但是经过进一步审查，看来我的问题实际上在于这些列是timedelta64s。 Does anyone know of any workarounds to obtain mean/median on timedeltas? 有谁知道在timedelta上获得均值/中位数的解决方法吗？

Thanks very much for any insight you may have! 非常感谢您提供的任何见解！

Answer 1

After some further googling/experimentation I confirmed that the issue appeared to be related to columns which were timedelta64 . 经过进一步的谷歌搜索/实验后，我确认该问题似乎与timedelta64列有关。 In order to perform pd.groupby on these columns I first converted them to floats like so: 为了在这些列上执行pd.groupby ，我首先将它们转换为float，如下所示：

df['End Time'] = df['End Time'].astype('timedelta64[ms]') / 86400000

There may be a more elegant solution to this but this allowed me to move forward with my analysis. 可能有一个更优雅的解决方案，但这使我可以继续进行分析。

Thanks! 谢谢！

Python Pandas Groupby删除DateTime列

问题描述

1 个解决方案

解决方案1
0 已采纳 2014-09-06 15:36:22

Python Pandas Groupby删除DateTime列

问题描述

1 个解决方案

解决方案1 0 已采纳 2014-09-06 15:36:22

解决方案1
0 已采纳 2014-09-06 15:36:22