简体   繁体   中英

Resample a pandas dataframe by an arbitrary factor

Pandas resampling is really convenient if your indices use datetime indexing, but I haven't found an easy implementation to resample by an arbitrary factor. Eg, just treat each index as an arbitrary index, and resample the dataframe so that its resulting length is 4X shorter (and being more intelligent about it than just taking every 4th datapoint).

This would be useful for anyone that's working with data that operates on a much shorter timescale than datetimes. For example, in my case I want to resample an audio vector from 44KHz to 11KHz. Right now I have to use scipy's "decimate" function, and then re-convert it back to a dataframe (using dataframe.apply wasn't working because it changes the length of the dataframe).

Anyone have any suggestions for how to accomplish this?

You can use DatetimeIndex to resample high frequency data (up to nanosecond precision, caveat: I believe this is only available in the upcoming 0.13 release). I've successfully used pandas to resample electrophysiological data in the 24KHz range. Here's an example:

In [97]: index = date_range('1/1/2001 00:00:00', '1/1/2001 00:00:01', freq='22727N')

In [98]: index
Out[98]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2001-01-01 00:00:00, ..., 2001-01-01 00:00:00.999988]
Length: 44001, Freq: 22727N, Timezone: None

In [99]: s = Series(randn(index.size), index=index)

In [100]: s.head(10)
Out[100]:
2001-01-01 00:00:00          -0.820
2001-01-01 00:00:00.000022   -1.141
2001-01-01 00:00:00.000045    1.577
2001-01-01 00:00:00.000068   -1.031
2001-01-01 00:00:00.000090    0.343
2001-01-01 00:00:00.000113   -0.424
2001-01-01 00:00:00.000136   -0.753
2001-01-01 00:00:00.000159    0.411
2001-01-01 00:00:00.000181    0.238
2001-01-01 00:00:00.000204    1.048
Freq: 22727N, dtype: float64

In [101]: s.resample(s.index.freq * 4, how='mean')
Out[101]:
2001-01-01 00:00:00          -0.354
2001-01-01 00:00:00.000090   -0.106
2001-01-01 00:00:00.000181    0.245
2001-01-01 00:00:00.000272    0.568
2001-01-01 00:00:00.000363    0.047
2001-01-01 00:00:00.000454   -0.560
2001-01-01 00:00:00.000545   -0.485
2001-01-01 00:00:00.000636   -0.271
2001-01-01 00:00:00.000727   -0.457
2001-01-01 00:00:00.000818    0.078
2001-01-01 00:00:00.000909    0.394
2001-01-01 00:00:00.000999    0.185
2001-01-01 00:00:00.001090   -0.441
2001-01-01 00:00:00.001181    0.300
2001-01-01 00:00:00.001272   -0.521
...
2001-01-01 00:00:00.998715   -0.045
2001-01-01 00:00:00.998806   -0.044
2001-01-01 00:00:00.998897    0.090
2001-01-01 00:00:00.998988    0.748
2001-01-01 00:00:00.999078   -0.179
2001-01-01 00:00:00.999169    0.451
2001-01-01 00:00:00.999260   -1.041
2001-01-01 00:00:00.999351   -0.476
2001-01-01 00:00:00.999442   -0.234
2001-01-01 00:00:00.999533   -0.719
2001-01-01 00:00:00.999624   -0.606
2001-01-01 00:00:00.999715   -0.032
2001-01-01 00:00:00.999806   -0.296
2001-01-01 00:00:00.999897   -0.044
2001-01-01 00:00:00.999988   -0.951
Freq: 90908N, Length: 11001

You can pass in a callable to how , which would allow you to "do something more intelligent". pandas defaults to taking the average over the period given (in this case, that's the average over each chunk of 22727 samples).

I have a dirty yet effective answer to propose :

first duplicate your index column in an other colum like this if your dataframe is called data :

for i in data.index:
    data.at[data.index[i],'num']=i

then simply resample using panda's ability for complex selection :

data_resampled = data[data['num']%frequency==0]

It might be possible to do this without copying the index colum or most probably a dedicated command exists to make this more elegant. Yet, this works.

OK, here is a maybe more pythonic way, in one line for a non datetime index :

data_resampled = data.reset_index()[data.reset_index()['index']%frequency==0]

this way you spare the for loop and you get an 'index' column that you can discard afterward if needed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM