[英]Pandas: Iterating over rows, Adding and Subtracting Dates, Appending those dates to a new column depending on row value
To give a brief overview of whats going on, I am observing temperature fluctutations and have filtered data from indoor and outdoor temp in an office only where temperature fluctuates.为了简要概述正在发生的事情,我正在观察温度波动,并且仅在温度波动的办公室中过滤了来自室内和室外温度的数据。 these fluctuations only occur in the mornings and at night as during the day, the temp is controlled.
这些波动只发生在早上和晚上,因为在白天,温度是可控的。 I will be using an ANN to learn from these fluctuations and model how long it would take for temp to change depending on other variables like OutdoorTemp, SolarDiffuseRate, etc.
我将使用 ANN 从这些波动中学习,并根据 OutdoorTemp、SolarDiffuseRate 等其他变量对温度变化需要多长时间进行建模。
Question 1: How do I iterate by row, firstly, looking at times and adding a binary column where 0 would be mornings, and 1 would be the night-time.问题 1:我如何逐行迭代,首先,查看时间并添加一个二进制列,其中 0 表示早晨,1 表示夜间。
Question 2: for each day, there will be a different length of series of rows for mornings and evenings depending on how long it takes the temperature to change between 22 degrees and 17 degrees.问题 2:对于每一天,根据温度在 22 度和 17 度之间变化所需的时间,早上和晚上的一系列行的长度会有所不同。 How do I add a column for each day, and each morning and evening, which states the time it took for the temp to get from X to Y.
我如何为每天、每天早上和晚上添加一列,说明温度从 X 到 Y 所需的时间。
Basically adding or subtracting time to get the difference, then appending per morning or night.基本上增加或减少时间以获得差异,然后在早上或晚上附加。
OfficeTemp OutdoorTemp SolarDiffuseRate
DateTime
2006-01-01 07:15:00 19.915275 0.8125 0.0
2006-01-01 07:30:00 20.463506 0.8125 0.0
2006-01-01 07:45:00 20.885112 0.8125 0.0
2006-01-01 20:15:00 19.985398 8.3000 0.0
2006-01-01 20:30:00 19.157857 8.3000 0.0
... ... ... ...
2006-06-30 22:00:00 18.056205 22.0125 0.0
2006-06-30 22:15:00 17.993072 19.9875 0.0
2006-06-30 22:30:00 17.929643 19.9875 0.0
2006-06-30 22:45:00 17.867148 19.9875 0.0
2006-06-30 23:00:00 17.804429 19.9875 0.0
df = pd.DataFrame(index=pd.date_range('2006-01-01', '2006-06-30', freq='15min'))
df['OfficeTemp'] = np.random.normal(loc=20, scale=5, size=df.shape[0])
df['OutdoorTemp'] = np.random.normal(loc=12, scale=5, size=df.shape[0])
df['SolarDiffuseRate'] = 0.0
Question 1:问题 1:
df['PartofDay'] = df.index.hour.map(lambda x: 0 if x < 12 else 1)
For question 2, a tolerance would need to be defined (the temperature is never going to be exactly 17 or 22 degrees).对于问题 2,需要定义一个容差(温度永远不会正好是 17 或 22 度)。
import numpy as np
def temp_change_duration(group):
tol=0.3
first_time = group.index[np.isclose(group['OfficeTemp'], 17, atol=tol)][0]
second_time = group.index[np.isclose(group['OfficeTemp'], 22, atol=tol)][0]
return(abs(second_time-first_time))
Then apply this function to our df
:然后将此函数应用于我们的
df
:
df.groupby([df.index.day, 'PartofDay']).apply(temp_change_duration)
This will get you most of the way there, but will give funny answers using the normally distributed synthetic data I've generated.这将使您完成大部分工作,但会使用我生成的正态分布合成数据给出有趣的答案。 See if you can adapt
temp_change_duration
to work with your data看看您是否可以调整
temp_change_duration
以处理您的数据
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.