用pandas在python中進行分組數據分析

Question

我有一個大的數據框。 時間之一是列（只是表示秒的整數）。 我想做一個groupBy，其中每個組代表2秒鍾的數據。 這樣做可以讓我用一行代碼在所有組上使用std或mean函數。 目的是能夠丟棄不符合特定條件的數據的時間增量。 以下偽代碼有望代表我想要做的事情。 請原諒我的粗心，因為我剛接觸熊貓。

 grouped = df.groupBy(df['time'])  #grouped for say 2 second increments. 
 groupStd = grouped.std()
 df.drop( items in group where groupStd> val)
 convert back to dataframe after the rows have been removed.

如果有人可以幫助我填補空白，那將非常有幫助。 謝謝！

Answer 1

你可以試試：

import pandas as pd

df = pd.DataFrame([[22, 18], [21, 23], [20, 17], [23, 45]], columns=['time', 'value'])

def sub_group_hash(x):
    return (x / 2).astype(int) * 2

grouped = df.drop('time', axis=1).groupby(sub_group_hash(df['time']))
groupStd = grouped.mean()
print groupStd

用pandas在python中進行分組數據分析

問題描述

1 個解決方案

解決方案1
0 2015-04-30 17:47:49

用pandas在python中進行分組數據分析

問題描述

1 個解決方案

解決方案1 0 2015-04-30 17:47:49

解決方案1
0 2015-04-30 17:47:49