[英]python pandas groupby agg get lag1
def lag1(x):
return x[(len(x)-1)]
x=pd.Series([12,3,4,5,6])
lag1(x)
Out[65]: 6
dat.shape
Out[70]: (247619, 33)
d2=dat.groupby('PATID_CD').agg(lag1)
Traceback (most recent call last):
File "<ipython-input-71-f514757a3da8>", line 1, in <module>
d2=dat.groupby('PATID_CD').agg(lag1)
File "D:\Users\shan xu\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 4658, in aggregate
return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
File "D:\Users\shan xu\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 4109, in aggregate
result = self._aggregate_generic(arg, *args, **kwargs)
File "D:\Users\shan xu\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 4133, in _aggregate_generic
return self._aggregate_item_by_item(func, *args, **kwargs)
File "D:\Users\shan xu\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 4162, in _aggregate_item_by_item
colg.aggregate(func, *args, **kwargs), data)
File "D:\Users\shan xu\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 3497, in aggregate
result = self._aggregate_named(func_or_funcs, *args, **kwargs)
File "D:\Users\shan xu\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 3627, in _aggregate_named
output = func(group, *args, **kwargs)
File "<ipython-input-64-be977293b7b9>", line 2, in lag1
return x[(len(x)-1)]
File "D:\Users\shan xu\Anaconda3\lib\site-packages\pandas\core\series.py", line 766, in __getitem__
result = self.index.get_value(self, key)
File "D:\Users\shan xu\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3103, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas\_libs\index.pyx", line 106, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 114, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 958, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 964, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 23
我不知道為什么我的函數不能正常工作,它給了我一個密鑰錯誤,提示該名稱不存在。 這有點令人困惑。 我是在做正確的方法,還是有其他解決方案?
dat.groupby('PATID_CD').agg('mean')
Out[73]:
MONTH_LOOKBACK_NR CCYYMM_CD ... ENG_SPOKEN EVENT_FL
PATID_CD ...
584 12.0 201556.500000 ... 1.0 0.0
4277 12.0 201556.500000 ... 1.0 0.0
我也嘗試過:
dat.groupby('PATID_CD').agg(lambda x : x.iloc[-1,:])
這個很好,但是我不能把這個函數放到一個可以和其他函數一起計算的列表中:
def lag1(x):
return x.iloc[-1,:]
d2=dat.groupby(dat['PATID_CD']).agg({'mean','max','min','std','skew', lambda x:len(x),kurtosis,lag1})
Traceback (most recent call last):
File "<ipython-input-86-ac95a8297b5c>", line 1, in <module>
d2=dat.groupby(dat['PATID_CD']).agg({'mean','max','min','std','skew', lambda x:len(x),kurtosis,lag1})
File "D:\Users\shan xu\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 4658, in aggregate
return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
File "D:\Users\shan xu\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 4089, in aggregate
result, how = self._aggregate(arg, _level=_level, *args, **kwargs)
File "D:\Users\shan xu\Anaconda3\lib\site-packages\pandas\core\base.py", line 551, in _aggregate
_axis=_axis), None
File "D:\Users\shan xu\Anaconda3\lib\site-packages\pandas\core\base.py", line 596, in _aggregate_multiple_funcs
results.append(colg.aggregate(arg))
File "D:\Users\shan xu\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 3485, in aggregate
(_level or 0) + 1)
File "D:\Users\shan xu\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 3558, in _aggregate_multiple_funcs
results[name] = obj.aggregate(func)
File "D:\Users\shan xu\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 3497, in aggregate
result = self._aggregate_named(func_or_funcs, *args, **kwargs)
File "D:\Users\shan xu\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 3627, in _aggregate_named
output = func(group, *args, **kwargs)
File "<ipython-input-85-6bbffa1ca952>", line 2, in lag1
return x.iloc[-1,:]
File "D:\Users\shan xu\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1472, in __getitem__
return self._getitem_tuple(key)
File "D:\Users\shan xu\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 2013, in _getitem_tuple
self._has_valid_tuple(tup)
File "D:\Users\shan xu\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 220, in _has_valid_tuple
raise IndexingError('Too many indexers')
IndexingError:索引器過多與此相同:
x=pd.Series([12,3,4,5,6])
lag1(x)
Traceback (most recent call last):
File "<ipython-input-85-6bbffa1ca952>", line 5, in <module>
lag1(x)
File "<ipython-input-85-6bbffa1ca952>", line 2, in lag1
return x.iloc[-1,:]
File "D:\Users\shan xu\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1472, in __getitem__
return self._getitem_tuple(key)
File "D:\Users\shan xu\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 2013, in _getitem_tuple
self._has_valid_tuple(tup)
File "D:\Users\shan xu\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 220, in _has_valid_tuple
raise IndexingError('Too many indexers')
IndexingError: Too many indexers
您尚未解釋您要完成的工作,並且代碼中也不清楚。 我在這里的代表不足,無法將此添加為評論,因此請考慮:
1)您的第一個示例是在隱式使用RangeIndex
的Series
上執行的,因此x[(5-1)] == 6
。 您的第二個示例似乎是一個以PATID_CD為索引的PATID_CD
。 如果PATID_CD.dtype
是如object
,你會得到一個KeyError
例外,只是因為你已經通過了錯誤的參數類型( int
)為大熊貓索引表達式DataFrame[label]
文檔 。
2)如果您只想提取組的最后一行,請編寫groupby調用,例如
dat.groupby('PATID_CD').apply(lambda x: x.iloc[-1, :])
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.