简体   繁体   中英

PYTHON : abbreviate big data in DataFrame

I have data which are constituted too many row in dataframe
ex)input:

No  col1 col2 col3 col4  

1    0     5    6    8   
2    0     5    7    8  
3    0     7    5    2  
4    0     4    4    5  
.    .     .    .    .  
.    .     .    .    .  
.    .     .    .    .  

output:

New_No col1 col2 col3 col4    
  1      0  5.66  6    6 
  .      .    .   .    .  
  .      .    .   .    .  
  .      .    .   .    .  
  .      .    .   .    .  
  .      .    .   .    .  

I want to abbreviate 3 rows in 1 rows to use average(3 rows average)
What can I do for this?

You can take the mean after using groupby :

>>> df = pd.DataFrame(np.random.randint(0, 10, (9, 5)))
>>> df
   0  1  2  3  4
0  9  7  9  8  8
1  5  5  5  5  7
2  6  5  3  3  0
3  5  2  9  3  3
4  6  0  5  9  4
5  9  8  9  2  3
6  6  9  8  7  2
7  8  1  9  7  6
8  7  9  2  2  8
>>> df.groupby(np.arange(len(df))//3).mean()
          0         1         2         3         4
0  6.666667  5.666667  5.666667  5.333333  5.000000
1  6.666667  3.333333  7.666667  4.666667  3.333333
2  7.000000  6.333333  6.333333  5.333333  5.333333

This works because when we divide the range by 3, we get clusters of 3:

>>> np.arange(len(df))//3
array([0, 0, 0, 1, 1, 1, 2, 2, 2])

and we can group on these numbers. This way, even if we wind up with a group of 2 (say because the total number of rows isn't divisible by 3), it automatically gives us the right mean.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM