简体   繁体   English

熊猫Datetime / dt.date的意外行为

[英]Unexpected Behavior with Pandas Datetime/ dt.date

I found a bug in my code that was due to the date column of my dataframe not including the hours and minutes and only including the date. 我在代码中发现了一个错误,该错误归因于数据框的日期列,其中不包括小时和分钟,而仅包含日期。 I traced the cause of the issue due to running these two functions consecutively vs. running them one by one. 我跟踪了导致此问题的原因,原因是连续运行这两个功能而不是一个接一个地运行它们。 If I run the functions one by one, there is no problem. 如果我一一运行这些功能,就没有问题。 If I run them both, my results are unexpected. 如果同时运行它们,结果将是意外的。

I need to run these functions consecutively, but they are not dependent on one another. 我需要连续运行这些功能,但它们并不相互依赖。 I'm new to Python, so I thought this might be due to the inputs being overwritten or something (not that that would have happened in Java, as far as I know). 我是Python的新手,所以我认为这可能是由于输入被覆盖之类的东西(据我所知,这不是Java会发生的事情)。 So, I changed the functions to be as follows: 因此,我将功能更改为如下:

def func1(dataset):
    originalData = dataset
    # only look at one day at a time- remove extra unnecessary info
    originalData ['Date'] = pd.to_datetime(originalData ['Date'])
    print dataset, 'test1'
    originalData ['Date'] = originalData ['Date'].dt.date
    print dataset, 'test2'
    # other stuff

def func2(dataset):
    originalData2 = dataset
    # look at entire datetime
    originalData2['Date'] = pd.to_datetime(originalData2['Date'])
    print originalData2
    # other stuff     

Run like this, I lose the time in the second function. 这样运行,我在第二个功能中浪费了时间。

csv = pd.read_csv(csvFileName)
func1(csv)
func2(csv)

Run like this, func2 results in my desired output: 像这样运行,func2产生我想要的输出:

csv = pd.read_csv(csvFileName)
func2(csv)

The wierd thing is if run func1, test1 prints out the date with datetime, while test2 prints out only the date. 更奇怪的是,如果运行func1,则test1将打印出带有datetime的日期,而test2仅打印出日期。 The dataset is being changed even though the changes are applied to originalDataset. 即使更改已应用于originalDataset,数据集也正在更改。 Am I misunderstanding something? 我误会了吗? Thanks in advance. 提前致谢。

If you don't want to make changes to the underlying data I'd recommend setting your data inside the function like this: originalData = dataset.copy(). 如果您不想更改基础数据,建议您在函数内部设置数据,例如: originalData = dataset.copy(). This method provides a deep copy, meaning, you'll only be editing the data within the function and not overriding the underlying object. 此方法提供了一个深层副本,这意味着您将仅在函数内编辑数据,而不会覆盖基础对象。

Odd behavior, yes. 奇怪的行为,是的。

You may also run into this when taking slices of dataframes and doing transformations on them. 在获取数据帧切片并对它们进行转换时,您可能还会遇到此问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM