最常用的值是使用pandas.DataFrame.resample

Question

我正在使用pandas.DataFrame.resample來重新采樣帶有時間戳索引的分組Pandas dataframe 。

在其中一列中，我想重新采樣，以便選擇最常用的值。 目前，我只是成功使用np.max或np.sum等NumPy函數。

#generate test dataframe
data = np.random.randint(0,10,(366,2))
index = pd.date_range(start=pd.Timestamp('1-Dec-2012'), periods=366, unit='D')
test = pd.DataFrame(data, index=index)

#generate group array
group =  np.random.randint(0,2,(366,))

#define how dictionary for resample
how_dict = {0: np.max, 1: np.min}

#perform grouping and resample
test.groupby(group).resample('48 h',how=how_dict)

之前的代碼有效，因為我使用了NumPy函數。 但是，如果我想以最常見的價值使用重新采樣，我不確定。 我嘗試定義一個自定義函數

def frequent(x):
    (value, counts) = np.unique(x, return_counts=True)
    return value[counts.argmax()]

但是，如果我現在這樣做：

how_dict = {0: np.max, 1: frequent}

我得到一個空的數據幀......

df = test.groupby(group).resample('48 h',how=how_dict)
df.shape

Answer 1

您的重新采樣周期太短，因此當一個組在一段時間內為空時，您的用戶函數會引發一個不會被pandas捕獲的ValueError。

但它沒有空組，例如使用常規組：

In [8]: test.groupby(arange(366)%2).resample('48h',how=how_dict).head()
Out[8]: 
              0  1
0 2012-12-01  4  8
  2012-12-03  0  3
  2012-12-05  9  5
  2012-12-07  3  4
  2012-12-09  7  3

或者更長的時期：

In [9]: test.groupby(group).resample('122D',how=how_dict)
Out[9]: 
              0  1
0 2012-12-02  9  0
  2013-04-03  9  0
  2013-08-03  9  6
1 2012-12-01  9  3
  2013-04-02  9  7
  2013-08-02  9  1

編輯

解決方法可以是管理空案例：

def frequent(x):
    if len(x)==0 : return -1
    (value, counts) = np.unique(x, return_counts=True)
    return value[counts.argmax()]

對於

In [11]: test.groupby(group).resample('48h',how=how_dict).head()
Out[11]: 
               0  1
0 2012-12-01   5  3
  2012-12-03   3  4
  2012-12-05 NaN -1
  2012-12-07   5  0
  2012-12-09   1  4

最常用的值是使用pandas.DataFrame.resample

問題描述

1 個解決方案

解決方案1
4 已采納 2016-04-06 21:20:41

最常用的值是使用pandas.DataFrame.resample

問題描述

1 個解決方案

解決方案1 4 已采納 2016-04-06 21:20:41

解決方案1
4 已采納 2016-04-06 21:20:41