简体   繁体   English

如何平均正确的数据:温度 - 每天和站点的平均温度

[英]How to average correct data: Temperature - Average Temperature per day and station

I have to subtract the daily average temperature to each temperature value during the day also taking in account the station it comes from, along 9 months.我必须将日平均温度减去白天的每个温度值,同时考虑到它来自的站点,以及 9 个月。 I already have the average, my data is something like this:我已经有了平均值,我的数据是这样的:

 >>> df1
   Station     Date                Temperature
0  Station1    2022-05-1 9:30:00   7,4
1  Station1    2022-05-1 9:45:00   7,45
2  Station1    2022-05-1 10:00:00  8,2
3  Station1    2022-05-1 10:15:00  8,4
4  Station1    2022-05-1 10:30:00  8,9
5  Station1    2022-05-1 9:30:00   7,5
6  Station2    2022-05-1 9:45:00   7,56
7  Station2    2022-05-1 10:00:00  8,4
8  Station2    2022-05-1 10:15:00  8,7
9  Station2    2022-05-1 10:30:00  8,1
10 ...

>>> df2
   Station     Date        AverageTemperaturePerDayAndStation
0  Station1    2022-05-1   8
1  Station1    2022-05-2   8,3
2  Station1    2022-05-3   8,6
3  Station1    2022-05-4   8,4
4  Station1    2022-05-5   7,9
5  Station2    2022-05-1   6
6  Station2    2022-05-2   7,3
7  Station2    2022-05-3   8,6
8  Station2    2022-05-4   7,4
9  Station2    2022-05-5   6,9
10 ...

So I want R to substranct Temperature - AverageTemperaturePerDayAndStation just like this:所以我想要 R 减去温度 - AverageTemperaturePerDayAndStation 就像这样:

>>> df3
   Station     Date                CorrectedTemperature 
0  Station1    2022-05-1 9:30:00   7,4  - 8
1  Station1    2022-05-1 9:45:00   7,45 - 8
2  Station1    2022-05-1 10:00:00  8,2  - 8
3  Station1    2022-05-1 10:15:00  8,4  - 8
4  Station1    2022-05-1 10:30:00  8,9  - 8
5  Station1    2022-05-1 9:30:00   7,5  - 8
6  Station2    2022-05-1 9:45:00   7,56 - 6
7  Station2    2022-05-1 10:00:00  8,4  - 6
8  Station2    2022-05-1 10:15:00  8,7  - 6
9  Station2    2022-05-1 10:30:00  8,1  - 6
10 ...

I would recommend the following:我会推荐以下内容:

#generate dataframes df1 and df2
df1 <- data.frame(Station =c(rep("Station1",5), rep("Station2",5)),
                  Date = c("2022-05-1 9:30:00","2022-05-1 9:45:00","2022-05-1 10:00:00","2022-05-1 10:15:00", "2022-05-1 10:30:00",
                           "2022-05-1 9:30:00","2022-05-1 9:45:00","2022-05-1 10:00:00","2022-05-1 10:15:00", "2022-05-1 10:30:00"),
                  Temperature = c(7.4, 7.45, 8.2, 8.4, 8.9, 7.5, 7.56, 8.4, 8.7, 8.1))
df2 <- data.frame(Station=c(rep("Station1",5), rep("Station2",5)),
                  Date = c("2022-05-01","2022-05-02","2022-05-03","2022-05-04","2022-05-05",
                           "2022-05-01","2022-05-02","2022-05-03","2022-05-04","2022-05-05"),
                  AverageTemperaturePerDayAndStation =c(8, 8.3, 8.6, 8.4, 7.9, 6, 7.3, 8.6, 7.4, 6.9))

#save Time and date in separate columns
df1$Time <- format(as.POSIXct(df1$Date), format = "%H:%M:%S")
df1$Date <- as.Date(df1$Date)

#change format of Date in df2 with as.Date
df2$Date <- as.Date(df2$Date)

#use left join (i.e. keep all entries from the first dataframe) and join by both Date and Station
df3 <- dplyr::left_join(df1, df2, by =c("Date","Station"))

Then, you can calculate the new column CorrectedTemperature.然后,您可以计算新列 CorrectedTemperature。

df3$CorrectedTemperature <- df3$Temperature - df3$AverageTemperaturePerDayAndStation

Hope this could help.希望这能有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM