[英]Calculate the average date every x rows
Previously I posted a question to calculate the average value for every 10 rows.之前我发布了一个问题来计算每 10 行的平均值。 It was successful in Zero's coding.它在 Zero 的编码中取得了成功。 Calculating the average value for every 10 cells in each column by pandas 通过pandas计算每列每10个单元格的平均值
However, there is an error in calculating the average value of the time date但是计算时间date的平均值有错误
import numpy as np
location2='C:\\Users\\Poon\\Downloads\\20211014_SBS_BEMS\\20211014_SBS_BEMS\\Test1044.csv'
csvfiles2=glob.glob(location2)
df3=pd.DataFrame()
for file_new_2 in csvfiles2:
df3=pd.read_csv(file_new_2)
df4=pd.concat([pd.to_datetime(df3.iloc[:,0]), df3.iloc[:, 1:].apply(pd.to_numeric)], axis = 1)
df4.dropna(inplace = True)
df4= df4.groupby(np.arange(len(df4))//10).mean()
print(df4)
The error code is错误代码是
Unable to parse string "2019-05-19 00:00:00" at position 0
I guess the commend pd.to_datetime cannot be summed up then divided by 10?我想表扬 pd.to_datetime 不能总结然后除以 10?
Here are some of the data from my excel, but totally there are 100k rows.这是我的 excel 中的一些数据,但总共有 100k 行。
19/5/2019 0:00 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:01 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:02 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:03 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:04 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:05 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:06 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:07 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:08 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:09 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:10 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:11 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:12 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:13 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:14 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:15 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:16 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:17 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:18 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:19 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:20 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:21 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:22 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:23 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:24 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:25 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:26 8840 20 237 64.93 82.35 16.15 46.88
19/5/2019 0:27 8840 20 237 64.93 82.35 16.15 46.88
Assuming column 0
in the provided example, convert the timestamps to integer, group by the floor division of the index by 10 and aggregate:假设在提供的示例中为第0
列,将时间戳转换为 integer,按索引的下限除以 10 分组并聚合:
import numpy as np
out = pd.to_datetime(pd.to_datetime(df[0])
.astype(np.int64)
.groupby(df.index//10)
.mean())
Output: Output:
0 2019-05-19 00:04:30
1 2019-05-19 00:14:30
2 2019-05-19 00:23:30
Name: 0, dtype: datetime64[ns]
You can use resample
:您可以使用resample
:
>>> (df4.assign(**{'dt': pd.to_datetime(df.iloc[:, 0])})[1:]
.resample('10T', on='dt').mean())
1 2 4 5 6
dt
2019-05-19 00:00:00 8840.0 20.0 82.35 16.15 46.88
2019-05-19 00:10:00 8840.0 20.0 82.35 16.15 46.88
2019-05-19 00:20:00 8840.0 20.0 82.35 16.15 46.88
It can be easier if your columns have names.如果您的列有名称,它会更容易。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.