I have a csv file that has 25000 rows. I want to put the average of every 30 rows in another csv file.
I've given an example with 9 rows as below and the new csv file has 3 rows (3, 1, 2) :
| H |
========
| 1 |---\
| 3 | |--->| 3 |
| 5 |---/
| -1 |---\
| 3 | |--->| 1 |
| 1 |---/
| 0 |---\
| 5 | |--->| 2 |
| 1 |---/
What I did:
import numpy as np
import pandas as pd
m_path = "file.csv"
m_df = pd.read_csv(m_path, usecols=['Col-01'])
m_arr = np.array([])
temp = m_df.to_numpy()
step = 30
for i in range(1, 25000, step):
arr = np.append(m_arr,np.array([np.average(temp[i:i + step])]))
data = np.array(m_arr)[np.newaxis]
m_df = pd.DataFrame({'Column1': data[0, :]})
m_df.to_csv('AVG.csv')
This works well but Is there any other option to do this?
You can use integer division by step
for consecutive groups and pass to groupby
for aggregate mean
:
step = 30
m_df = pd.read_csv(m_path, usecols=['Col-01'])
df = m_df.groupby(m_df.index // step).mean()
Or:
df = m_df.groupby(np.arange(len(dfm_df// step).mean()
Sample data:
step = 3
df = m_df.groupby(m_df.index // step).mean()
print (df)
H
0 3
1 1
2 2
You can get rolling mean using DataFrame.rolling
and then filter result using slicing
df.rolling(3).mean()[2::3].reset_index(drop=True)
a
0 3.0
1 1.0
2 2.0
It might be simpler to do it all in numpy.
import numpy as np
x = np.array([1, 3, 5, -1, 3, 1, 0, 5, 1 ])
steps = 3
for i in range(0, len(x), steps):
avg = np.average(x[i:i+steps])
print (f'average starting at el {i} is {avg}')
This prints:
average starting at el 0 is 3.0
average starting at el 3 is 1.0
average starting at el 6 is 2.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.