简体   繁体   中英

Python: Calculating Average and Standard deviation for every hour in csv file

I have a large csv files, and the data looks like this:

YY-MO-DD HH-MI-SS_SSS    |     Temperature   |      Magnetic
2015-12-07 20:51:06:608  |        22.7       |        32.3
2015-12-07 20:51:07:609  |        22.5       |        47.7
  ....                            ...                  ...

Now I want to use python with pandas to create a csv that looks like this:

   Hour       |     Average Temp   |    Average Mag
   20:00      |         22.6       |       40
   21:00      |         ...        |       ...

and so on for each of the 24 hours.

the second thing is I want to do the same average, but for every day of the month:

Date       |     Average Temp   |    Average Mag
7-12-2015  |         22.6       |       40
8-12-2015  |         ...        |       ...

Is there a good way to do it in python ? I've tried excel, but the csv's are very very large and I have bunch of them (hopefully to create a loop for doing the same thing to every file)

Thank you !

You can first convert to_datetime , then set_index and last resample with aggregating mean and std

In version 0.18.0 in new resample api

#convert column to datetime
df['YY-MO-DD HH-MI-SS_SSS'] = pd.to_datetime(df['YY-MO-DD HH-MI-SS_SSS'], format='%Y-%m-%d %H:%M:%S:%f')

#set index from column
df = df.set_index('YY-MO-DD HH-MI-SS_SSS')

#resample and aggregate mean
print df.resample('H').mean()
                       Temperature  Magnetic
YY-MO-DD HH-MI-SS_SSS                       
2015-12-07 20:00:00           22.6      40.0

print df.resample('H').std()
                       Temperature   Magnetic
YY-MO-DD HH-MI-SS_SSS                        
2015-12-07 20:00:00       0.141421  10.889444

print df.resample('D').mean()
                       Temperature  Magnetic
YY-MO-DD HH-MI-SS_SSS                       
2015-12-07                    22.6      40.0
print df.resample('D').std()
                       Temperature   Magnetic
YY-MO-DD HH-MI-SS_SSS                        
2015-12-07                0.141421  10.889444

If you want aggregate all to new columns , you can use:

#resample and aggregate mean
df1 = df.resample('H').agg(['mean','std'])
df1.columns = [' '.join(col) for col in df1.columns]
print df1.reset_index()
  YY-MO-DD HH-MI-SS_SSS  Temperature mean  Temperature std  Magnetic mean  \
0   2015-12-07 20:00:00              22.6         0.141421           40.0   

   Magnetic std  
0     10.889444  

df2 = df.resample('D').agg(['mean','std'])
df2.columns = [' '.join(col) for col in df2.columns]
print df2.reset_index()
  YY-MO-DD HH-MI-SS_SSS  Temperature mean  Temperature std  Magnetic mean  \
0            2015-12-07              22.6         0.141421           40.0   

   Magnetic std  
0     10.889444  

If you need write to_csv (maybe without index):

df1.to_csv('myfile.csv', index=False)
df['Average Temp'] = df.Temperature.resample('H', how='mean')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM