简体   繁体   English

pd.to_datetime 不将日期时间转换为 int 以进行 df.rolling 计算

[英]pd.to_datetime not converting datetime into int for df.rolling calculation

I'm attempting to create a rolling average over 10 minutes on an irregularly time stepped data set.我试图在不规则的时间步长数据集上创建超过 10 分钟的滚动平均值。 I get the error shown below我收到如下所示的错误

Traceback (most recent call last):
  File "asosreaderpandas.py", line 13, in <module>
    df.rolling('10min').mean()
  File "/opt/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py", line 8900, in rolling
    on=on, axis=axis, closed=closed)
  File "/opt/anaconda3/lib/python3.6/site-packages/pandas/core/window.py", line 2469, in rolling
    return Rolling(obj, **kwds)
  File "/opt/anaconda3/lib/python3.6/site-packages/pandas/core/window.py", line 80, in __init__
    self.validate()
  File "/opt/anaconda3/lib/python3.6/site-packages/pandas/core/window.py", line 1478, in validate
    raise ValueError("window must be an integer")
ValueError: window must be an integer

This is my code that I am using to create my rolling average, I would manually input my timestamps, as that has solved my issue in the past, except the .txt file is 98,000 lines long...这是我用来创建滚动平均值的代码,我会手动输入我的时间戳,因为这在过去解决了我的问题,除了 .txt 文件长 98,000 行...

import pandas as pd
from datetime import datetime

df = pd.read_csv('KART.txt', header = 0)
#indexing the date format from txt file
pd.to_datetime(df.index, format='%Y-%m-%d %H:%M')
#creating ten minute average
df.rolling('10min').mean()
print(df)

I don't understand the pandas module well, I have tried multiple ways of assigning my datetime differently to no avail am I going about this completely wrong?我不太了解 pandas 模块,我尝试了多种以不同方式分配日期时间的方法,但无济于事,这完全是错误的吗?

Dataset Sample数据集样本

0,1
2019-01-01 00:00:00,4
2019-01-01 00:05:00,4
2019-01-01 00:10:00,4
2019-01-01 00:15:00,4
2019-01-01 00:25:00,5
2019-01-01 00:30:00,4
2019-01-01 00:35:00,4
2019-01-01 00:40:00,4
2019-01-01 00:45:00,4
2019-01-01 00:50:00,4
2019-01-01 00:55:00,4
2019-01-01 00:56:00,4
2019-01-01 01:00:00,4
...

You have multiple issues in you code:您的代码中有多个问题:

  1. you have an automatic integer index assigned to your dataframe when you load your dataframe without specifying the column index (you later try to convert into datetime which is obviously not what you want)当您加载数据框而不指定列索引时,您有一个自动整数索引分配给您的数据框(您稍后尝试转换为日期时间,这显然不是您想要的)

  2. you don't save the index when you convert it to datetime将索引转换为日期时间时不保存索引

Here's the fixed version:这是固定版本:

import pandas as pd
from datetime import datetime

df = pd.read_csv('KART.txt', header = 0, index_col=0)  # <- specified column index
df.index = pd.to_datetime(df.index, format='%Y-%m-%d %H:%M')  # <- saving index when converting it to datetime
df.rolling('10min').mean()
>                     1
0   
2019-01-01 00:00:00 4.0
2019-01-01 00:05:00 4.0
2019-01-01 00:10:00 4.0
2019-01-01 00:15:00 4.0
2019-01-01 00:25:00 5.0
2019-01-01 00:30:00 4.5
2019-01-01 00:35:00 4.0
2019-01-01 00:40:00 4.0
2019-01-01 00:45:00 4.0
2019-01-01 00:50:00 4.0
2019-01-01 00:55:00 4.0
2019-01-01 00:56:00 4.0
2019-01-01 01:00:00 4.0
...

EDIT编辑
Thanks to the comment of Parfait you can be get even a shorter version of a code by parsing dates right in the read_csv method:感谢Parfait的评论,您可以通过在read_csv方法中解析日期来获得更短版本的代码:

import pandas as pd
from datetime import datetime

df = pd.read_csv('KART.txt', 
                 header = 0, 
                 index_col=0,  # <-- specified column index
                 parse_dates=True)  # <-- parsed dates from txt

df.rolling('10min').mean()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM