简体   繁体   中英

Calculate the average date every x rows

Previously I posted a question to calculate the average value for every 10 rows. It was successful in Zero's coding. Calculating the average value for every 10 cells in each column by pandas

However, there is an error in calculating the average value of the time date

import numpy as np

location2='C:\\Users\\Poon\\Downloads\\20211014_SBS_BEMS\\20211014_SBS_BEMS\\Test1044.csv'
csvfiles2=glob.glob(location2)

df3=pd.DataFrame()

for file_new_2 in csvfiles2: 
    df3=pd.read_csv(file_new_2)

    df4=pd.concat([pd.to_datetime(df3.iloc[:,0]), df3.iloc[:, 1:].apply(pd.to_numeric)], axis = 1)
    df4.dropna(inplace = True)
    df4= df4.groupby(np.arange(len(df4))//10).mean()

print(df4)

The error code is

Unable to parse string "2019-05-19 00:00:00" at position 0

I guess the commend pd.to_datetime cannot be summed up then divided by 10?

Here are some of the data from my excel, but totally there are 100k rows.

19/5/2019 0:00  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:01  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:02  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:03  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:04  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:05  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:06  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:07  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:08  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:09  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:10  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:11  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:12  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:13  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:14  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:15  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:16  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:17  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:18  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:19  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:20  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:21  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:22  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:23  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:24  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:25  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:26  8840    20  237 64.93   82.35   16.15   46.88
19/5/2019 0:27  8840    20  237 64.93   82.35   16.15   46.88

Assuming column 0 in the provided example, convert the timestamps to integer, group by the floor division of the index by 10 and aggregate:

import numpy as np
out = pd.to_datetime(pd.to_datetime(df[0])
                       .astype(np.int64)
                       .groupby(df.index//10)
                       .mean())

Output:

0   2019-05-19 00:04:30
1   2019-05-19 00:14:30
2   2019-05-19 00:23:30
Name: 0, dtype: datetime64[ns]

You can use resample :

>>> (df4.assign(**{'dt': pd.to_datetime(df.iloc[:, 0])})[1:]
        .resample('10T', on='dt').mean())

                          1     2      4      5      6
dt                                                    
2019-05-19 00:00:00  8840.0  20.0  82.35  16.15  46.88
2019-05-19 00:10:00  8840.0  20.0  82.35  16.15  46.88
2019-05-19 00:20:00  8840.0  20.0  82.35  16.15  46.88

It can be easier if your columns have names.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM