在 dataframe 中保存日期時間時保持時區為 csv

Question

我正在使用以下代碼將時間戳保存到磁盤，然后在以后查找自該時間以來已經過去了多少時間。 我的問題是，當我使用 businesstimedelta package 時，它返回一個錯誤，即我的 dataframe 沒有時區。 我假設它在保存到 csv 時會丟失：

import pandas as pd
import time
import datetime
import pytz
import businesstimedelta
from pytz import timezone   


workday = businesstimedelta.WorkDayRule(start_time=datetime.time(9,30),end_time=datetime.time(16),working_days=[0, 1, 2, 3, 4])

timestamps = pd.DataFrame([datetime.datetime.now(timezone('America/New_York'))])
time.sleep(5)
timestamps.to_csv('timestamps.csv')
timestamps2 = pd.read_csv('timestamps.csv')
difference = workday.difference(timestamps2,datetime.datetime.now(timezone('America/New_York'))).hours

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-23-0e7502d68c65> in <module>
     10 timestamps.to_csv('timestamps.csv')
     11 timestamps2 = pd.read_csv('production temp/positions.csv')
---> 12 difference = workday.difference(timestamps2,datetime.datetime.now(timezone('America/New_York'))).hours

c:\users\g\appdata\local\programs\python\python38\lib\site-packages\businesstimedelta\rules\rule.py in difference(self, dt1, dt2)
     29     def difference(self, dt1, dt2):
     30         """Calculate the business time between two datetime objects."""
---> 31         dt1 = localize_unlocalized_dt(dt1)
     32         dt2 = localize_unlocalized_dt(dt2)
     33         start_dt, end_dt = sorted([dt1, dt2])

c:\users\g\appdata\local\programs\python\python38\lib\site-packages\businesstimedelta\businesstimedelta.py in localize_unlocalized_dt(dt)
      8     https://docs.python.org/3/library/datetime.html#datetime.timezone
      9     """
---> 10     if dt.tzinfo is not None and dt.tzinfo.utcoffset(dt) is not None:
     11         return dt
     12     return pytz.utc.localize(dt)

c:\users\g\appdata\local\programs\python\python38\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5463             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5464                 return self[name]
-> 5465             return object.__getattribute__(self, name)
   5466 
   5467     def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'tzinfo'

Answer 1

我假設它在保存到 csv 時會丟失

是的，這是您問題的一部分。 CSV 是一種低保真數據格式，不保留大多數對象的數據類型。 一開始，所有內容都被讀取為數字或字符串。 然后，CSV 的讀者有責任弄清楚要使用哪些數據類型。 （熊貓在自動檢測方面做得不錯。）

您在這里有幾個選擇：

讀入 dataframe 后，將字符串轉換為正確的日期時間格式。

timestamps2["0"] = pd.to_datetime(timestamps2["0"])

告訴 Pandas 在讀取文件時使用什么轉換器。

timestamps2 = pd.read_csv("./timestamps.csv", converters={"0": pd.to_datetime})

導出為保留數據類型的其他文件格式，例如 pickle。

現在，一旦您讀取數據並將其加載到 datetime 數據類型而不是object ，您會發現該系列具有pandas._libs.tslibs.timestamps.Timestamp ：

dt1 = datetime.datetime.now(timezone('America/New_York'))
timestamps = pd.Series(data=[dt1])
print(type(dt1)) # <class 'datetime.datetime'>
print(timestamps.dtype) # datetime64[ns, America/New_York]
print(type(timestamps.at[0])) # <class 'pandas._libs.tslibs.timestamps.Timestamp'>

businesstimedelta庫似乎沒有對 Pandas 對象提供矢量化操作，而且它似乎只適用於本機 Python 日期時間對象。 所以這是一種解決方案：

dt1 = datetime.datetime.now(timezone('America/New_York'))
dt2 = dt1 + datetime.timedelta(seconds=1)
timestamps = pd.Series([dt1])
timestamps.apply(lambda dt: workday.difference(dt.to_pydatetime(), dt2))

0    <BusinessTimeDelta 0 hours 1 seconds>
dtype: object

您還應該查看 Pandas 對時間增量的原生支持： https://pandas.pydata.org/docs/user_guide/timedeltas.html

並支持多種工作日： https://pandas.pydata.org/docs/user_guide/timeseries.html?highlight=business#dateoffset-objects

在 dataframe 中保存日期時間時保持時區為 csv

問題描述

1 個解決方案

解決方案1
1 已采納 2021-05-28 15:11:17

在 dataframe 中保存日期時間時保持時區為 csv

問題描述

1 個解決方案

解決方案1 1 已采納 2021-05-28 15:11:17

解決方案1
1 已采納 2021-05-28 15:11:17