简体   繁体   English

使用Apply函数将基于数据框中月份的值替换为r中另一列中的值

[英]Replace values based on months in a dataframe with values in another column in r, using apply functions

I am working with a time series of precipitation data and attempting to use the median imputation method to replace all 0 value data points with the median of all data points for the corresponding month that that 0 value was recorded. 我正在处理降雨数据的时间序列,并尝试使用中位数插补方法将所有0值数据点替换为记录0值的对应月份的所有数据点的中值。

I have two data frames, one with the original precipitation data: 我有两个数据框,其中一个包含原始降水数据:

 > head(df.m)
       prcp       date
1 121.00485 1975-01-31
2 122.41667 1975-02-28
3  82.74026 1975-03-31
4 104.63514 1975-04-30
5  57.46667 1975-05-31
6  38.97297 1975-06-30

And one with the median monthly values: 一个月中值的值:

> medians
   Group.1         x
1       01 135.90680
2       02 123.52613
3       03 113.09841
4       04  98.10044
5       05  75.21976
6       06  57.47287
7       07  54.16667
8       08  45.57653
9       09  77.87740
10      10 103.25179
11      11 124.36795
12      12 131.30695

Below is the current solution that I have come up with utilizing the 1st answer here : 下面是我想出了利用第1回答当前的解决方案在这里

df.m[,"prcp"] <- sapply(df.m[,"prcp"], function(y) ifelse(y==0, medians$x,y))

This has not worked as it only applies the first value of the df medians$Group.1 , which is the month of January ( 01 ). 这没有用,因为它仅应用df medians$Group.1的第一个值,即一月( 01 )的月份。 How can I get the values so that correct median will be applied from the corresponding month? 如何获取值,以便从相应月份应用正确的中位数?

Another way I have attempted a solution is via the below: 我尝试解决的另一种方法是通过以下方法:

df.m[,"prcp"] <- sapply(medians$Group.1, function(y)
                 ifelse(df.m[format.Date(df.m$date, "%m") == y & 
                 df.m$prcp == 0, "prcp"], medians[medians$Group.1 == y,"x"], 
                 df.m[,"prcp"]))   

Description of the above function - this function tests and returns the amount of zeros for every month that there is a zero value in df.m[,"prcp"] Same issue here as the 1st solution, but it does return all of the 0 values by month (if just executing the sapply() portion). 上面函数的描述-此函数测试并返回df.m[,"prcp"]df.m[,"prcp"]为零的零df.m[,"prcp"]与第一个解决方案相同,但它确实返回所有0按月的值(如果只是执行sapply()部分)。

How can I replace all 0 in df.m$prcp with their corresponding medians from the medians df based on the month of the data? 如何根据数据月份从df medians相应的中位数替换df.m$prcp所有0

Apologies if this is a basic question, I'm somewhat of a newbie here. 抱歉,如果这是一个基本问题,我在这里有点新手。 Any and all help would be greatly appreciated. 任何和所有帮助将不胜感激。

Consider merging the two dataframes by month/group and then calculating with ifelse : 考虑按月/组合并两个数据帧,然后使用ifelse计算:

# MERGE TWO FRAMES
df.m$month <- format(df.m$date, "%m")
df.merge <- merge(df.m, medians, by.x="month", by.y="Group.1")

# CONDITIONAL CALCULATION
df.merge$prcp <- ifelse(df.merge$prcp == 0, df.merge$x, df.merge$prcp)

# RETURN BACK TO ORIGINAL STRUCTURE
df.m <- df.merge[names(df.m)]

A dplyr version, which does not rely on original order. dplyr版本,不依赖原始顺序。 This uses slightly modified test data to show replacement of zeroes and multiple years 这使用稍微修改的测试数据来显示零和多年的替换

require(dplyr)

## test data with zeroes - extended for addtional years
df.m <- read.delim(text="
i prcp date
1 121.00485 1975-01-31
2 122.41667 1975-02-28
3 82.74026 1975-03-31
4 104.63514 1975-04-30
5 57.46667 1975-05-31
6 38.97297 1975-06-30
7 0 1976-06-30
8 0 1976-07-31
9 70 1976-08-31
", sep="", stringsAsFactors = FALSE)

medians <- read.delim(text="
i month x
1       01 135.90680
2       02 123.52613
3       03 113.09841
4       04  98.10044
5       05  75.21976
6       06  57.47287
7       07  54.16667
8       08  45.57653
9       09  77.87740
10      10 103.25179
11      11 124.36795
12      12 131.30695
", sep = "", stringsAsFactors = FALSE, strip.white = TRUE)

# extract the month as integer
df.m$month = as.integer(substr(df.m$date,6,7))

# match to medians by joining
result <- df.m %>% 
  inner_join(medians, by='month') %>%
  mutate(prcp = ifelse(prcp == 0, x, prcp)) %>%
  select(prcp, date)

result

yields 产量

       prcp       date
1 121.00485 1975-01-31
2 122.41667 1975-02-28
3  82.74026 1975-03-31
4 104.63514 1975-04-30
5  57.46667 1975-05-31
6  38.97297 1975-06-30
7  57.47287 1976-06-30
8  54.16667 1976-07-31
9  70.00000 1976-08-31

I created small datasets with some zero values and added one line of code: 我创建了一些零值的小型数据集,并添加了一行代码:

#create sample data    
prcp <- c(1.5,0.0,0.0,2.1)
date <- c(01,02,03,04)
x <- c(1.11,2.22,3.33,4.44)

df <- data.frame(prcp,date)
grp <- data.frame(x,date)

#Make the assignment
df[df$prcp == 0,]$prcp <- grp[df$prcp == 0,]$x

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据 R 中的另一列 dataframe 替换一列中的值 - Replace values in one column based on another dataframe in R 根据另一个数据框中的列替换列值 - Replace column values based on column in another dataframe R:使用行/列替换另一个数据框的值 - R: Replace values of dataframe from another using row/column 与 R 中另一个 dataframe 中的列匹配时,替换 dataframe 中的列中的值 - Replace values in column of a dataframe when matching to column in another dataframe in R 如何根据R中另一列中的值替换数据框的列中的值? - How to replace values in the columns of a dataframe based on the values in the other column in R? 如何根据 R dataframe 中的列将 NA 值替换为不同的值? - How to replace NA values with different values based on column in R dataframe? 如何基于R中另一列中的值替换列值? - How to replace column values based on values in another column in R? 根据 R 中同一 dataframe 中的另一列的值将值分配给一列 - Assigning values to a column in the based on values of another column in the same dataframe in R 根据另一个数据框中的值有条件地替换数据框中的列名 - Conditionally replace column names in a dataframe based on values in another dataframe 根据输入为 dataframe 的另一列替换列的值 - Replace values of a column based on another column having as input a dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM