[英]How do I use a loop to replace values with averages based on another column in r dataframe
If I have the following data frame in r:如果我在 r 中有以下数据框:
Pitcher Pitch.Spin..rpm.
A 2350
A 2400
A 2233
A 1100
B 2145
B 2200
B 2340
B 1050
and I wanted to write a loop in R to replace the low values in A and B with their respective means that excluded the bad readings so that the output would be:我想在 R 中编写一个循环,用它们各自的平均值替换 A 和 B 中的低值,排除错误读数,以便输出:
A 2350
A 2400
A 2233
A 2328
B 2145
B 2200
B 2340
B 2228
How would I go about doing that?我该怎么做? Below was my attempt and my issue comes from not being sure how to properly reference the Pitcher value in the specific row
以下是我的尝试,我的问题来自不确定如何正确引用特定行中的 Pitcher 值
for (i in 1:nrow(data)){
if (data$Pitch.Spin..rpm. < 1500)
data$Pitch.Spin..rpm. <- mean(data$Pitch.Spin..rpm.[Pitcher == {i}],na.rm = TRUE)
}
We could do this with a group by operation.我们可以通过一组操作来做到这一点。 After grouping by 'Pitcher',
mutate
the 'Pitch.Spin..rpm.'按“Pitcher”分组后,
mutate
“Pitch.Spin..rpm”。 by replace
ing the elements that are less than 1500 with the the mean
of that column通过
replace
该列的mean
replace
小于 1500 的元素
library(dplyr)
data <- data %>%
group_by(Pitcher) %>%
mutate(`Pitch.Spin..rpm.` = replace(`Pitch.Spin..rpm.`,
`Pitch.Spin..rpm.` < 1500, mean(`Pitch.Spin..rpm.`, na.rm = TRUE)))
A base R solution, with ave
.基本 R 解决方案,具有
ave
。
ave(df$`Pitch.Spin..rpm.`, df$Pitcher, FUN = function(x){
i <- x < 1500
if(any(i)) x[i] <- mean(x[!i])
x
})
#[1] 2350.000 2400.000 2233.000 2327.667 2145.000 2200.000 2340.000
#[8] 2228.333
Now assign this result back to the df's column.现在将此结果分配回 df 的列。
df$Pitch.Spin..rpm. <- ave(df$Pitch.Spin..rpm., df$Pitcher, FUN = function(x){
i <- x < 1500
if(any(i)) x[i] <- mean(x[!i])
x
})
df
# Pitcher Pitch.Spin..rpm.
#1 A 2350.000
#2 A 2400.000
#3 A 2233.000
#4 A 2327.667
#5 B 2145.000
#6 B 2200.000
#7 B 2340.000
#8 B 2228.333
An approach using dplyr
and ifelse()
to replace values is next:接下来是使用
dplyr
和ifelse()
替换值的方法:
library(dplyr)
#Data
df <- structure(list(Pitcher = c("A", "A", "A", "A", "B", "B", "B",
"B"), Pitch.Spin..rpm. = c(2350L, 2400L, 2233L, 1100L, 2145L,
2200L, 2340L, 1050L)), class = "data.frame", row.names = c(NA,
-8L))
Code:代码:
#Code
df %>% group_by(Pitcher) %>%
mutate(NewVar=ifelse(Pitch.Spin..rpm.<1500,NA,Pitch.Spin..rpm.),
Mean=mean(NewVar,na.rm=T),
Pitch.Spin..rpm. = ifelse(is.na(NewVar),Mean,Pitch.Spin..rpm.)) %>%
select(-c(NewVar,Mean))
Output:输出:
# A tibble: 8 x 2
# Groups: Pitcher [2]
Pitcher Pitch.Spin..rpm.
<chr> <dbl>
1 A 2350
2 A 2400
3 A 2233
4 A 2328.
5 B 2145
6 B 2200
7 B 2340
8 B 2228.
A way to do with loop is next but you have to save results in a list:接下来是一种处理循环的方法,但您必须将结果保存在列表中:
#Unique pitcher
val <- unique(df$Pitcher)
#Create empty list
List <- list()
#Loop
for(i in val)
{
#Isolate data
data1 <- subset(df,Pitcher==i)
#Compute mean
meanval <- mean(data1$Pitch.Spin..rpm.[!data1$Pitch.Spin..rpm.<1500])
#Replace
data1$Pitch.Spin..rpm.[data1$Pitch.Spin..rpm.<1500]<-meanval
#Save in list
List[[i]] <- data1
}
#Now bind the list
newdf <- do.call(rbind,List)
rownames(newdf) <- NULL
Output:输出:
Pitcher Pitch.Spin..rpm.
1 A 2350.000
2 A 2400.000
3 A 2233.000
4 A 2327.667
5 B 2145.000
6 B 2200.000
7 B 2340.000
8 B 2228.333
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.