I have a large csv files, and the data looks like this:
YY-MO-DD HH-MI-SS_SSS | Temperature | Magnetic
2015-12-07 20:51:06:608 | 22.7 | 32.3
2015-12-07 20:51:07:609 | 22.5 | 47.7
.... ... ...
Now I want to use python with pandas to create a csv that looks like this:
Hour | Average Temp | Average Mag
20:00 | 22.6 | 40
21:00 | ... | ...
and so on for each of the 24 hours.
the second thing is I want to do the same average, but for every day of the month:
Date | Average Temp | Average Mag
7-12-2015 | 22.6 | 40
8-12-2015 | ... | ...
Is there a good way to do it in python ? I've tried excel, but the csv's are very very large and I have bunch of them (hopefully to create a loop for doing the same thing to every file)
Thank you !
You can first convert to_datetime
, then set_index
and last resample
with aggregating mean
and std
In version 0.18.0
in new resample api
#convert column to datetime
df['YY-MO-DD HH-MI-SS_SSS'] = pd.to_datetime(df['YY-MO-DD HH-MI-SS_SSS'], format='%Y-%m-%d %H:%M:%S:%f')
#set index from column
df = df.set_index('YY-MO-DD HH-MI-SS_SSS')
#resample and aggregate mean
print df.resample('H').mean()
Temperature Magnetic
YY-MO-DD HH-MI-SS_SSS
2015-12-07 20:00:00 22.6 40.0
print df.resample('H').std()
Temperature Magnetic
YY-MO-DD HH-MI-SS_SSS
2015-12-07 20:00:00 0.141421 10.889444
print df.resample('D').mean()
Temperature Magnetic
YY-MO-DD HH-MI-SS_SSS
2015-12-07 22.6 40.0
print df.resample('D').std()
Temperature Magnetic
YY-MO-DD HH-MI-SS_SSS
2015-12-07 0.141421 10.889444
If you want aggregate all to new columns , you can use:
#resample and aggregate mean
df1 = df.resample('H').agg(['mean','std'])
df1.columns = [' '.join(col) for col in df1.columns]
print df1.reset_index()
YY-MO-DD HH-MI-SS_SSS Temperature mean Temperature std Magnetic mean \
0 2015-12-07 20:00:00 22.6 0.141421 40.0
Magnetic std
0 10.889444
df2 = df.resample('D').agg(['mean','std'])
df2.columns = [' '.join(col) for col in df2.columns]
print df2.reset_index()
YY-MO-DD HH-MI-SS_SSS Temperature mean Temperature std Magnetic mean \
0 2015-12-07 22.6 0.141421 40.0
Magnetic std
0 10.889444
If you need write to_csv
(maybe without index):
df1.to_csv('myfile.csv', index=False)
df['Average Temp'] = df.Temperature.resample('H', how='mean')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.