Python GroupBy時間間隔

Question

我想請您幫忙對熊貓數據框執行操作。

我的初始數據幀如下：

我想每隔30秒重塑一次並計算每組的平均值。

我使用了以下內容：

 df_a['avgValue']= df_a['value'].groupby([df_a['id_A'],df_a['course'], pd.TimeGrouper(freq='30S')]).transform(np.mean)

我得到以下信息：

但是ts_A未按30秒分組。 例如，第一行和第二行應合並為一個，以獲得預期的結果：

id_A               ts_A course weight avgValue
id1 2017-04-27 01:35:30 cotton 3.5    150.000
id1 2017-04-27 01:36:00 cotton 3.5    416.000
...

我的問題是：如何更改上面的代碼以獲得預期的結果？

提前謝謝了。 最好的問候，卡洛

Answer 1

我認為您需要（假設ts_A設置為DatetimeIndex ） GroupBy.mean並省略transform功能：

#if not datetimeindex
#df['ts_A'] = pd.to_datetime(df['ts_A'])
#df = df.set_index('ts_A')


df = df_a['value'].groupby([df_a['id_A'],
                            df_a['course'], 
                            df_a['weight'],
                            pd.TimeGrouper(freq='30S')]).mean().reset_index()

要么：

df = df_a.groupby(['id_A','course','weight', 
                   pd.TimeGrouper(freq='30S')])['value'].mean().reset_index()
print (df)
  id_A       course  weight                ts_A       value
0  id1       cotton     3.5 2017-04-27 01:35:30  150.000000
1  id1       cotton     3.5 2017-04-27 01:36:00  416.666667
2  id1       cotton     3.5 2017-04-27 01:36:30  700.000000
3  id1       cotton     3.5 2017-04-27 01:37:00  950.000000
4  id2  cotton blue     5.0 2017-04-27 02:35:30  150.000000
5  id2  cotton blue     5.0 2017-04-27 02:36:00  450.000000
6  id2  cotton blue     5.0 2017-04-27 02:36:30  520.666667
7  id2  cotton blue     5.0 2017-04-27 02:37:00  610.000000

resample解決方案：

df = df_a.groupby(['id_A','course','weight'])['value'].resample('30S').mean().reset_index()
print (df)
  id_A       course  weight                ts_A       value
0  id1       cotton     3.5 2017-04-27 01:35:30  150.000000
1  id1       cotton     3.5 2017-04-27 01:36:00  416.666667
2  id1       cotton     3.5 2017-04-27 01:36:30  700.000000
3  id1       cotton     3.5 2017-04-27 01:37:00  950.000000
4  id2  cotton blue     5.0 2017-04-27 02:35:30  150.000000
5  id2  cotton blue     5.0 2017-04-27 02:36:00  450.000000
6  id2  cotton blue     5.0 2017-04-27 02:36:30  520.666667
7  id2  cotton blue     5.0 2017-04-27 02:37:00  610.000000

設定：

d = {'weight': {0: 3.5, 1: 3.5, 2: 3.5, 3: 3.5, 4: 3.5, 5: 3.5, 6: 3.5, 7: 3.5, 8: 3.5, 9: 3.5, 10: 5.0, 11: 5.0, 12: 5.0, 13: 5.0, 14: 5.0, 15: 5.0, 16: 5.0, 17: 5.0, 18: 5.0, 19: 5.0}, 'value': {0: 100, 1: 200, 2: 350, 3: 400, 4: 500, 5: 600, 6: 700, 7: 800, 8: 900, 9: 1000, 10: 100, 11: 200, 12: 450, 13: 300, 14: 600, 15: 500, 16: 522, 17: 540, 18: 320, 19: 900}, 'ts_A': {0: '2017-04-27 01:35:40', 1: '2017-04-27 01:35:50', 2: '2017-04-27 01:36:00', 3: '2017-04-27 01:36:10', 4: '2017-04-27 01:36:20', 5: '2017-04-27 01:36:30', 6: '2017-04-27 01:36:40', 7: '2017-04-27 01:36:50', 8: '2017-04-27 01:37:00', 9: '2017-04-27 01:37:10', 10: '2017-04-27 02:35:40', 11: '2017-04-27 02:35:50', 12: '2017-04-27 02:36:00', 13: '2017-04-27 02:36:10', 14: '2017-04-27 02:36:20', 15: '2017-04-27 02:36:30', 16: '2017-04-27 02:36:40', 17: '2017-04-27 02:36:50', 18: '2017-04-27 02:37:00', 19: '2017-04-27 02:37:10'}, 'course': {0: 'cotton', 1: 'cotton', 2: 'cotton', 3: 'cotton', 4: 'cotton', 5: 'cotton', 6: 'cotton', 7: 'cotton', 8: 'cotton', 9: 'cotton', 10: 'cotton blue', 11: 'cotton blue', 12: 'cotton blue', 13: 'cotton blue', 14: 'cotton blue', 15: 'cotton blue', 16: 'cotton blue', 17: 'cotton blue', 18: 'cotton blue', 19: 'cotton blue'}, 'id_A': {0: 'id1', 1: 'id1', 2: 'id1', 3: 'id1', 4: 'id1', 5: 'id1', 6: 'id1', 7: 'id1', 8: 'id1', 9: 'id1', 10: 'id2', 11: 'id2', 12: 'id2', 13: 'id2', 14: 'id2', 15: 'id2', 16: 'id2', 17: 'id2', 18: 'id2', 19: 'id2'}}

df_a = pd.DataFrame(d)
df_a['ts_A'] = pd.to_datetime(df_a['ts_A'])
df_a = df_a.set_index('ts_A')
print (df_a)
                          course id_A  value  weight
ts_A                                                
2017-04-27 01:35:40       cotton  id1    100     3.5
2017-04-27 01:35:50       cotton  id1    200     3.5
2017-04-27 01:36:00       cotton  id1    350     3.5
2017-04-27 01:36:10       cotton  id1    400     3.5
2017-04-27 01:36:20       cotton  id1    500     3.5
2017-04-27 01:36:30       cotton  id1    600     3.5
2017-04-27 01:36:40       cotton  id1    700     3.5
2017-04-27 01:36:50       cotton  id1    800     3.5
2017-04-27 01:37:00       cotton  id1    900     3.5
2017-04-27 01:37:10       cotton  id1   1000     3.5
2017-04-27 02:35:40  cotton blue  id2    100     5.0
2017-04-27 02:35:50  cotton blue  id2    200     5.0
2017-04-27 02:36:00  cotton blue  id2    450     5.0
2017-04-27 02:36:10  cotton blue  id2    300     5.0
2017-04-27 02:36:20  cotton blue  id2    600     5.0
2017-04-27 02:36:30  cotton blue  id2    500     5.0
2017-04-27 02:36:40  cotton blue  id2    522     5.0
2017-04-27 02:36:50  cotton blue  id2    540     5.0
2017-04-27 02:37:00  cotton blue  id2    320     5.0
2017-04-27 02:37:10  cotton blue  id2    900     5.0

Answer 2

首先，應將datetime列設置為索引，因為TimeGrouper會對索引進行操作。

df = df_a.set_index('ts_A')

然后計算調整后的groupby操作。 最好先對感興趣的列進行分組，然后在目標變量上計算平均值。

results = df.groupby(['id_A','course','weight', pd.TimeGrouper(freq='30S')])['value'].mean()

Python GroupBy時間間隔

問題描述

2 個解決方案

解決方案1
3 已采納 2017-07-20 12:52:54

解決方案2
2 2017-07-20 12:57:22

Python GroupBy時間間隔

問題描述

2 個解決方案

解決方案1 3 已采納 2017-07-20 12:52:54

解決方案2 2 2017-07-20 12:57:22

解決方案1
3 已采納 2017-07-20 12:52:54

解決方案2
2 2017-07-20 12:57:22