Python pandas，计算填写调查前15分钟的平均温度传感器（匹配时间戳+添加新列）

Question

我正在尝试在包含调查数据 (=PIT_da.xlsx) 的 excel 后面添加新列。 在这些列中，应计算并添加填写调查前 15、30 和 60 分钟的平均传感器值（例如温度）。 传感器数据位于 excel 文件“IEQ_da.xlsx”（包括时间戳）中。

我是这样开始的：

#import raw file
import pandas as pd
import numpy as np
dfSD = pd.read_excel('IEQ_da.xlsx')
dfPIT = pd.read_excel('PIT_da.xlsx')

#main aim: add after each survey result row in PIT_da.xlsx columns for the average values of the indoor environmental quality parameters in 15/30/60 minutes before submitting the survey

#Step 0: set both timestamp and submitdate to right datetime object
dfSD['timestamp'] =  pd.to_datetime(dfSD['timestamp'], format='%d%b%Y:%H:%M:%S.%f')
dfPIT['submitdate'] =  pd.to_datetime(dfPIT['submitdate'], format='%d%b%Y:%H:%M:%S.%f')

#Step 1: introduce arrays and set to numpy
array1 = dfSD[['timestamp']].to_numpy().ravel()
array2 = dfPIT[['submitdate']].to_numpy().ravel()
data_sensorID = dfSD[['devid']].to_numpy().ravel()
survey_sensorID = dfPIT[['PIT5']].to_numpy().ravel()Each survey has a timestamp (=submitdate) and should be matched to the sensor data at that timestamp.

将时间转换为数字，以便能够计算 15min /30min/60min 的差异

#Step 2: set timestamps to number and define a match 
from datetime import datetime
def timestamps(x) : 
    Timestamps = np.empty(x.size)
    for i in range(x.size) : 
        date = x[i]
        dt64 = np.datetime64(date)
        timestamp = (dt64 - np.datetime64('1970-01-01T00:00:00Z')) / np.timedelta64(1, 's')
        Timestamps[i] = timestamp
    return Timestamps

array1TS = timestamps(array1)
array2TS = timestamps(array2)

接下来，为每个调查提交时间和传感器时间戳（已经四舍五入到最接近的 5 分钟）进行匹配，包括来自相同传感器设备 ID (=devid) 和 PIT5 的条件（调查中询问传感器 ID 的问题）附近的传感器）。

#Step 3: define match with conditions: must be same timestamp and must have same sensor ID, by means of a matrix
Match = np.empty([array1TS.size, array2TS.size])
for i in range(array1TS.size) : 
    for j in range(array2TS.size):
        if (data_sensorID[i] == survey_sensorID[j]):
            if (array1TS[i] == array2TS[j]):
                Match[i,j] = 1;
            else: 
                Match[i,j] = 0;

现在，通过此匹配，应将一个新列添加到“PIT_da.xlsx”，其平均值为 de IEQ_da.xlsx 文件中“SENtemp”列的匹配时间戳前 15 分钟的平均值（带有温度值）。

问题： 1. 如何从“匹配”中选择 go 以在匹配的时间戳前 15 分钟从该时间戳中选择所有行。 2. 如何计算这些选定行的平均值（忽略空单元格）并将其放入 PIT_da.xlsx 中的新列（此新列应命名为“SENtemp_15”，用于填写调查前 15 分钟的温度）在）。

供参考使用的一些数据行：

IEQ_da.xlsx

    import pandas as pd

df = pd.DataFrame({'timestamp' : ['14/04/2020  00:18:00', '14/04/2020  00:18:05', '14/04/2020  00:17:55', '14/04/2020  00:17:50' , '14/04/2020  00:17:40', '14/04/2020  00:17:40', '14/04/2020  00:17:20', '14/04/2020  00:17:20'], 'devid' : ['4', '2', '4', '2', '4' , '2' , '4' , '2'], 
                       'SENtemp' : ['20,2', '18,8', '20,1', '19', '20,2', '18,8', '20,1', '18,9']})
df

PIT_da.xlsx

import pandas as pd

df = pd.DataFrame({'submitdate' : ['14/04/2020  00:18:00', '14/04/2020  00:18:05'], 'PIT5' : ['4', '2'],
                   })
df

我希望有人愿意帮助我！

Answer 1

您的 2 个初始步骤相当无用。 您可以直接在dfPIT上使用apply来构建新列。 最难的部分是SENtemp是一个以逗号为小数点的字符串列，不能直接转换为浮点数。 可能的代码：

delta = [15, 30, 60]  # delta in minutes

columns = [f'Average{i}' for i in delta]  # column names per delta values

dfPIT[columns] = dfPIT.apply(axis=1, func=lambda x: pd.Series(
    [dfSD.loc[(dfSD['timestamp']>x['submitdate'] - pd.Timedelta(i, 'T'))
              &(dfSD['timestamp']<=x['submitdate']), 'SENtemp']
     .str.replace(',','.').astype('float').mean() for i in delta],
    index=columns))

使用您的样本数据，它给出：

           submitdate PIT5  Average15  Average30  Average60
0 2020-04-14 00:18:00    4  19.614286  19.614286  19.614286
1 2020-04-14 00:18:05    2  19.512500  19.512500  19.512500

Python pandas，计算填写调查前15分钟的平均温度传感器（匹配时间戳+添加新列）

问题描述

1 个解决方案

解决方案1
0 2020-05-23 15:13:50

Python pandas，计算填写调查前15分钟的平均温度传感器（匹配时间戳+添加新列）

问题描述

1 个解决方案

解决方案1 0 2020-05-23 15:13:50

解决方案1
0 2020-05-23 15:13:50