简体   繁体   中英

grouped data analysis in python with pandas

I have a large dataframe. One of the columns is time (just integers representing seconds). I would like to do a groupBy where each group represents say 2 seconds of data. Doing this would allow me to use the std or mean functions on all of the groups with one line of code. The goal is to be able to throw out time increments of data that don't meet a certain criteria. the following pseudo code hopefully represents what I want to do. Please excuse the crudeness as i'm pretty new to pandas.

 grouped = df.groupBy(df['time'])  #grouped for say 2 second increments. 
 groupStd = grouped.std()
 df.drop( items in group where groupStd> val)
 convert back to dataframe after the rows have been removed. 

If someone could help me fill in the blanks that would be extremely helpful. Thank you!

You can try :

import pandas as pd

df = pd.DataFrame([[22, 18], [21, 23], [20, 17], [23, 45]], columns=['time', 'value'])

def sub_group_hash(x):
    return (x / 2).astype(int) * 2

grouped = df.drop('time', axis=1).groupby(sub_group_hash(df['time']))
groupStd = grouped.mean()
print groupStd

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM