简体   繁体   English

计算每 x 行的平均日期

[英]Calculate the average date every x rows

Previously I posted a question to calculate the average value for every 10 rows.之前我发布了一个问题来计算每 10 行的平均值。 It was successful in Zero's coding.它在 Zero 的编码中取得了成功。 Calculating the average value for every 10 cells in each column by pandas 通过pandas计算每列每10个单元格的平均值

However, there is an error in calculating the average value of the time date但是计算时间date的平均值有错误

import numpy as np

location2='C:\\Users\\Poon\\Downloads\\20211014_SBS_BEMS\\20211014_SBS_BEMS\\Test1044.csv'
csvfiles2=glob.glob(location2)

df3=pd.DataFrame()

for file_new_2 in csvfiles2: 
    df3=pd.read_csv(file_new_2)

    df4=pd.concat([pd.to_datetime(df3.iloc[:,0]), df3.iloc[:, 1:].apply(pd.to_numeric)], axis = 1)
    df4.dropna(inplace = True)
    df4= df4.groupby(np.arange(len(df4))//10).mean()

print(df4)

The error code is错误代码是

Unable to parse string "2019-05-19 00:00:00" at position 0

I guess the commend pd.to_datetime cannot be summed up then divided by 10?我想表扬 pd.to_datetime 不能总结然后除以 10?

Here are some of the data from my excel, but totally there are 100k rows.这是我的 excel 中的一些数据,但总共有 100k 行。

19/5/2019 0:00  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:01  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:02  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:03  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:04  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:05  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:06  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:07  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:08  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:09  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:10  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:11  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:12  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:13  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:14  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:15  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:16  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:17  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:18  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:19  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:20  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:21  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:22  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:23  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:24  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:25  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:26  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:27  8840    20  237 64.93   82.35   16.15   46.88

Assuming column 0 in the provided example, convert the timestamps to integer, group by the floor division of the index by 10 and aggregate:假设在提供的示例中为第0列,将时间戳转换为 integer,按索引的下限除以 10 分组并聚合:

import numpy as np
out = pd.to_datetime(pd.to_datetime(df[0])
                       .astype(np.int64)
                       .groupby(df.index//10)
                       .mean())

Output: Output:

0   2019-05-19 00:04:30
1   2019-05-19 00:14:30
2   2019-05-19 00:23:30
Name: 0, dtype: datetime64[ns]

You can use resample :您可以使用resample

>>> (df4.assign(**{'dt': pd.to_datetime(df.iloc[:, 0])})[1:]
        .resample('10T', on='dt').mean())

                          1     2      4      5      6
dt                                                    
2019-05-19 00:00:00  8840.0  20.0  82.35  16.15  46.88
2019-05-19 00:10:00  8840.0  20.0  82.35  16.15  46.88
2019-05-19 00:20:00  8840.0  20.0  82.35  16.15  46.88

It can be easier if your columns have names.如果您的列有名称,它会更容易。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM