简体   繁体   中英

efficient way to find unique values within time windows in python?

I have a large pandas dataframe that countains data similar to the image attached.

在此处输入图像描述

I want to get a count of how many unique TN exist within each 2 second window of the data. I've done this with a simple loop, but it is incredibly slow. Is there a better technique I can use to get this?

My original code is:

uniqueTN = []
tmstart = 5400; tmstop = 86400
for tm in range(int(tmstart), int(tmstop), 2):
    df = rundf[(rundf['time']>=(tm-2))&rundf['time']<tm)]
    uniqueTN.append(df['TN'].unique())

This solution would be fine it the set of data was not so large.

Here is how you can implement groupby() method and nunique() .

rundf['time'] = (rundf['time'] // 2) * 2
grouped = rundf.groupby('time')['TN'].nunique()

Another alternative is to use the resample() method of pandas and then the nunique() method.

grouped = rundf.resample('2S', on='time')['TN'].nunique()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM