在熊貓中對期間序列值進行分組

Question

從讀取具有歷史日期的熊貓的CSV文件開始，接下來我有一些CSV數據，形式為：

Object,Earliest Date
Object1,01/01/2000
Object2,01/01/1760
Object3,01/01/1520
...

現在我已經讀到了Pandas（使用Period處理歷史日期）並創建了一個系列。 我正在嘗試將系列划分為數十年，但絆倒了將Period值轉換為groupby期望的形式。 到目前為止，我已經嘗試過（其中s是from_csv創建的系列）：

def dt_parse(s):
  try:
    d,m,y = s.split('/')
    return pd.Period(year=int(y), month=int(m), day=int(d), freq='D')
  except:
    return pd.NaT
s2 = s['Earliest Date'].apply(dt_parse) #Create Period values
pi = pd.PeriodIndex(s2)
decades = pi.groupby(pd.Grouper(freq="120M")).count()

失敗與：

 TypeError: Argument 'labels' has incorrect type (expected numpy.ndarray, got TimeGrouper)

嘗試按一系列分組：

 decades = s2.groupby(pd.Grouper(freq="120M")).count()

失敗與：

 TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'

嘗試將其作為DataFrame分組：

df = pd.DataFrame(s2)
decades = df.groupby(pd.Grouper(freq="120M", key='Earliest Date')).size()

失敗與：

AttributeError: 'Index' object has no attribute 'to_timestamp'

不知道該怎么辦？！

Answer 1

錯誤消息和熊貓文檔將在這里成為您的朋友。

我不知道您的日期欄是否包含嚴格唯一的日期。 如果它們是微不足道的，只需將其用作索引即可使用pd.Grouper 。 否則，定義自己的分組功能：

def grouper(ind):
    y = df.loc[ind]['Earliest Date'].year 
    return y - (y % 10)

# I'm assuming that df is the dataframe from pd.read_csv("/path/to/csv")
# and that there's a column named "earliest date" 
# that is a Period or Datetime or something with a year attribute
gb = df.groupby(by=grouper)
print(gb.size())

在熊貓中對期間序列值進行分組

問題描述

1 個解決方案

解決方案1
0 已采納 2016-05-03 07:37:44

在熊貓中對期間序列值進行分組

問題描述

1 個解決方案

解決方案1 0 已采納 2016-05-03 07:37:44

解決方案1
0 已采納 2016-05-03 07:37:44