繁体   English   中英

如何从数据的日期时间列中提取年、周和月?

[英]How to extract Year, Week and Month from the Datetime column in my data?

最初这是我的 df 的 dtype:

All Markets      object
Category         object
Brand            object
Target Age       object
Segment          object
Sub_brand        object
Form             object
Week             object
Sales           float64
Units           float64
TDP             float64
No_of_Stores    float64
dtype: object

“周”列最初是这样的:

01/06/18
01/13/18
01/20/18

我通过编写将 Week 列转换为 DateTime:

df['Week']=pd.to_datetime(df['Week'])

现在 dtype 是这样的:

All Markets             object
Category                object
Brand                   object
Target Age              object
Segment                 object
Sub_brand               object
Form                    object
Week            datetime64[ns]
Sales                  float64
Units                  float64
TDP                    float64
No_of_Stores           float64
dtype: object

我创建了一个新的数据框,在其中按周对数据集求和:

df_all_sales=df.groupby(["Week"]).sum()
df_all_sales


Week        Sales           Units       TDP         No_of_Stores        
2018-01-06  3.524456e+07    2328906.175 1860.108    1068546.48
2018-01-13  3.108469e+07    2045011.831 1745.664    1068606.48
2018-01-20  2.603041e+07    1748838.880 1631.943    1067000.64
2018-01-27  2.453881e+07    1582999.340 1582.581    1067461.32
2018-02-03  2.440598e+07    1639932.560 1584.747    1067419.68
... ... ... ... ...
2020-06-27  6.205612e+06    373815.090  704.343 998781.74
2020-07-04  7.054332e+06    427955.540  779.252 999300.06
2020-07-11  7.137108e+06    438363.230  783.708 998931.23
2020-07-18  7.545068e+06    465413.700  822.505 998794.83
2020-07-25  7.329634e+06    458153.310  807.745 998794.83
134 rows × 4 columns

但是当我试图从周列中提取年、月和日期时,它显示错误。 我试过这些代码:

import datetime as dt    
df_all_sales['year'] = pd.DatetimeIndex(df_all_sales['Week']).year
df_all_sales['month'] = pd.DatetimeIndex(df_all_sales['Week']).month

这是错误:

KeyError                                  Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Week'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-29-06aaa1163a18> in <module>
      1 # extracting date, month and year from the datetime
----> 2 df_all_sales['year'] = pd.DatetimeIndex(df_all_sales['Week']).year
      3 df_all_sales['month'] = pd.DatetimeIndex(df_all_sales['Week']).month
      4 
      5 #df_all_sales['year'] = df_all_sales['Week'].dt.year

~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   3022             if self.columns.nlevels > 1:
   3023                 return self._getitem_multilevel(key)
-> 3024             indexer = self.columns.get_loc(key)
   3025             if is_integer(indexer):
   3026                 indexer = [indexer]

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:
-> 3082                 raise KeyError(key) from err
   3083 
   3084         if tolerance is not None:

KeyError: 'Week'

我也试过这些代码:

df_all_sales['year'] = df_all_sales['Week'].dt.year
df_all_sales['month'] = df_all_sales['Week'].dt.month

这是错误:

KeyError                                  Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Week'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-30-06b216de9275> in <module>
      3 #df_all_sales['month'] = pd.DatetimeIndex(df_all_sales['Week']).month
      4 
----> 5 df_all_sales['year'] = df_all_sales['Week'].dt.year
      6 df_all_sales['month'] = df_all_sales['Week'].dt.month

~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   3022             if self.columns.nlevels > 1:
   3023                 return self._getitem_multilevel(key)
-> 3024             indexer = self.columns.get_loc(key)
   3025             if is_integer(indexer):
   3026                 indexer = [indexer]

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:
-> 3082                 raise KeyError(key) from err
   3083 
   3084         if tolerance is not None:

KeyError: 'Week'

使用dt访问器:

df_all_sales['Week'] = pd.to_datetime(df['Week'])
df_all_sales['year'] = df_all_sales['Week'].dt.year
df_all_sales['month'] = df_all_sales['Week'].dt.month

例子:

# Before:
>>> df
       Week
0  01/06/18
1  01/13/18
2  01/20/18

# After:
>>> df
        Week  year  month
0 2018-01-06  2018      1
1 2018-01-13  2018      1
2 2018-01-20  2018      1

尝试在groupby之前这样做,否则Week不再是列而是索引

别的:

df_all_sales['Week'] = pd.to_datetime(df['Week'])
df_all_sales.groupby(["Week"]).sum()

df_all_sales['year'] = df_all_sales.index.year
df_all_sales['month'] = df_all_sales.index.month

在@HenryEcker 的帮助下更新

在您groupby ,“Week”可能成为您的 DataFrame 的索引,因此您首先需要在尝试访问“Week”列之前reset_index

df_all_sales = df_all_sales.reset_index()
df_all_sales["Year"] = df_all_sales["Week"].dt.year
df_all_sales["Month"] = df_all_sales["Week"].dt.month
df_all_sales["Week"] = df_all_sales["Week"].dt.week

或者,如果您想从 DatetimeIndex 访问它,您可以执行以下操作:

df_all_sales["Year"] = df_all_sales.index.year
df_all_sales["Month"] = df_all_sales.index.month
df_all_sales["Week"] = df_all_sales.index.isocalendar().week

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM