如何使用循环替换基于 r 数据框中另一列的平均值的值

Question

If I have the following data frame in r:如果我在 r 中有以下数据框：

Pitcher Pitch.Spin..rpm. 
A     2350
A     2400
A     2233
A     1100
B     2145
B     2200
B     2340
B     1050

and I wanted to write a loop in R to replace the low values in A and B with their respective means that excluded the bad readings so that the output would be:我想在 R 中编写一个循环，用它们各自的平均值替换 A 和 B 中的低值，排除错误读数，以便输出：

How would I go about doing that?我该怎么做？ Below was my attempt and my issue comes from not being sure how to properly reference the Pitcher value in the specific row以下是我的尝试，我的问题来自不确定如何正确引用特定行中的 Pitcher 值

for (i in 1:nrow(data)){
  if (data$Pitch.Spin..rpm. < 1500)
  data$Pitch.Spin..rpm. <- mean(data$Pitch.Spin..rpm.[Pitcher == {i}],na.rm = TRUE)
}

Answer 1

We could do this with a group by operation.我们可以通过一组操作来做到这一点。 After grouping by 'Pitcher', mutate the 'Pitch.Spin..rpm.'按“Pitcher”分组后， mutate “Pitch.Spin..rpm”。 by replace ing the elements that are less than 1500 with the the mean of that column通过replace该列的mean replace小于 1500 的元素

library(dplyr)
data <- data %>%
   group_by(Pitcher) %>%
   mutate(`Pitch.Spin..rpm.` = replace(`Pitch.Spin..rpm.`, 
        `Pitch.Spin..rpm.` < 1500, mean(`Pitch.Spin..rpm.`, na.rm = TRUE)))

Answer 2

A base R solution, with ave .基本 R 解决方案，具有ave 。

ave(df$`Pitch.Spin..rpm.`, df$Pitcher, FUN = function(x){
  i <- x < 1500
  if(any(i)) x[i] <- mean(x[!i])
  x
})
#[1] 2350.000 2400.000 2233.000 2327.667 2145.000 2200.000 2340.000
#[8] 2228.333

Now assign this result back to the df's column.现在将此结果分配回 df 的列。

df$Pitch.Spin..rpm. <- ave(df$Pitch.Spin..rpm., df$Pitcher, FUN = function(x){
  i <- x < 1500
  if(any(i)) x[i] <- mean(x[!i])
  x
})

df
#  Pitcher Pitch.Spin..rpm.
#1       A         2350.000
#2       A         2400.000
#3       A         2233.000
#4       A         2327.667
#5       B         2145.000
#6       B         2200.000
#7       B         2340.000
#8       B         2228.333

Answer 3

An approach using dplyr and ifelse() to replace values is next:接下来是使用dplyr和ifelse()替换值的方法：

library(dplyr)

#Data
df <- structure(list(Pitcher = c("A", "A", "A", "A", "B", "B", "B", 
"B"), Pitch.Spin..rpm. = c(2350L, 2400L, 2233L, 1100L, 2145L, 
2200L, 2340L, 1050L)), class = "data.frame", row.names = c(NA, 
-8L))

Code:代码：

#Code
df %>% group_by(Pitcher) %>%
  mutate(NewVar=ifelse(Pitch.Spin..rpm.<1500,NA,Pitch.Spin..rpm.),
         Mean=mean(NewVar,na.rm=T),
         Pitch.Spin..rpm. = ifelse(is.na(NewVar),Mean,Pitch.Spin..rpm.)) %>%
  select(-c(NewVar,Mean))

Output:输出：

# A tibble: 8 x 2
# Groups:   Pitcher [2]
  Pitcher Pitch.Spin..rpm.
  <chr>              <dbl>
1 A                  2350 
2 A                  2400 
3 A                  2233 
4 A                  2328.
5 B                  2145 
6 B                  2200 
7 B                  2340 
8 B                  2228.

A way to do with loop is next but you have to save results in a list:接下来是一种处理循环的方法，但您必须将结果保存在列表中：

#Unique pitcher
val <- unique(df$Pitcher)
#Create empty list
List <- list()
#Loop
for(i in val)
{
  #Isolate data
  data1 <- subset(df,Pitcher==i)
  #Compute mean
  meanval <- mean(data1$Pitch.Spin..rpm.[!data1$Pitch.Spin..rpm.<1500])
  #Replace
  data1$Pitch.Spin..rpm.[data1$Pitch.Spin..rpm.<1500]<-meanval
  #Save in list
  List[[i]] <- data1
}
#Now bind the list
newdf <- do.call(rbind,List)
rownames(newdf) <- NULL

Output:输出：

  Pitcher Pitch.Spin..rpm.
1       A         2350.000
2       A         2400.000
3       A         2233.000
4       A         2327.667
5       B         2145.000
6       B         2200.000
7       B         2340.000
8       B         2228.333

如何使用循环替换基于 r 数据框中另一列的平均值的值

问题描述

3 个解决方案

解决方案1
1 已采纳 2020-08-27 22:15:03

解决方案2
1 2020-08-27 22:28:22

解决方案3
0 2020-08-27 22:19:30

如何使用循环替换基于 r 数据框中另一列的平均值的值

问题描述

3 个解决方案

解决方案1 1 已采纳 2020-08-27 22:15:03

解决方案2 1 2020-08-27 22:28:22

解决方案3 0 2020-08-27 22:19:30

解决方案1
1 已采纳 2020-08-27 22:15:03

解决方案2
1 2020-08-27 22:28:22

解决方案3
0 2020-08-27 22:19:30