简体   繁体   English

Pandas:迭代行,添加和减去日期,根据行值将这些日期附加到新列

[英]Pandas: Iterating over rows, Adding and Subtracting Dates, Appending those dates to a new column depending on row value

To give a brief overview of whats going on, I am observing temperature fluctutations and have filtered data from indoor and outdoor temp in an office only where temperature fluctuates.为了简要概述正在发生的事情,我正在观察温度波动,并且仅在温度波动的办公室中过滤了来自室内和室外温度的数据。 these fluctuations only occur in the mornings and at night as during the day, the temp is controlled.这些波动只发生在早上和晚上,因为在白天,温度是可控的。 I will be using an ANN to learn from these fluctuations and model how long it would take for temp to change depending on other variables like OutdoorTemp, SolarDiffuseRate, etc.我将使用 ANN 从这些波动中学习,并根据 OutdoorTemp、SolarDiffuseRate 等其他变量对温度变化需要多长时间进行建模。

Question 1: How do I iterate by row, firstly, looking at times and adding a binary column where 0 would be mornings, and 1 would be the night-time.问题 1:我如何逐行迭代,首先,查看时间并添加一个二进制列,其中 0 表示早晨,1 表示夜间。

Question 2: for each day, there will be a different length of series of rows for mornings and evenings depending on how long it takes the temperature to change between 22 degrees and 17 degrees.问题 2:对于每一天,根据温度在 22 度和 17 度之间变化所需的时间,早上和晚上的一系列行的长度会有所不同。 How do I add a column for each day, and each morning and evening, which states the time it took for the temp to get from X to Y.我如何为每天、每天早上和晚上添加一列,说明温度从 X 到 Y 所需的时间。

Basically adding or subtracting time to get the difference, then appending per morning or night.基本上增加或减少时间以获得差异,然后在早上或晚上附加。

                     OfficeTemp  OutdoorTemp  SolarDiffuseRate 
DateTime                                                         
2006-01-01 07:15:00   19.915275       0.8125               0.0   
2006-01-01 07:30:00   20.463506       0.8125               0.0   
2006-01-01 07:45:00   20.885112       0.8125               0.0   
2006-01-01 20:15:00   19.985398       8.3000               0.0   
2006-01-01 20:30:00   19.157857       8.3000               0.0   
...                         ...          ...               ...   
2006-06-30 22:00:00   18.056205      22.0125               0.0   
2006-06-30 22:15:00   17.993072      19.9875               0.0   
2006-06-30 22:30:00   17.929643      19.9875               0.0   
2006-06-30 22:45:00   17.867148      19.9875               0.0   
2006-06-30 23:00:00   17.804429      19.9875               0.0   
df = pd.DataFrame(index=pd.date_range('2006-01-01', '2006-06-30', freq='15min'))
df['OfficeTemp'] = np.random.normal(loc=20, scale=5, size=df.shape[0])
df['OutdoorTemp'] = np.random.normal(loc=12, scale=5, size=df.shape[0])
df['SolarDiffuseRate'] = 0.0

Question 1:问题 1:

df['PartofDay'] = df.index.hour.map(lambda x: 0 if x < 12 else 1)

For question 2, a tolerance would need to be defined (the temperature is never going to be exactly 17 or 22 degrees).对于问题 2,需要定义一个容差(温度永远不会正好是 17 或 22 度)。

import numpy as np

def temp_change_duration(group):
   tol=0.3
   first_time = group.index[np.isclose(group['OfficeTemp'], 17, atol=tol)][0]
   second_time = group.index[np.isclose(group['OfficeTemp'], 22, atol=tol)][0]
   return(abs(second_time-first_time))

Then apply this function to our df :然后将此函数应用于我们的df

df.groupby([df.index.day, 'PartofDay']).apply(temp_change_duration)

This will get you most of the way there, but will give funny answers using the normally distributed synthetic data I've generated.这将使您完成大部分工作,但会使用我生成的正态分布合成数据给出有趣的答案。 See if you can adapt temp_change_duration to work with your data看看您是否可以调整temp_change_duration以处理您的数据

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM