简体   繁体   English

如何使用循环替换基于 r 数据框中另一列的平均值的值

[英]How do I use a loop to replace values with averages based on another column in r dataframe

If I have the following data frame in r:如果我在 r 中有以下数据框:

Pitcher Pitch.Spin..rpm. 
A     2350
A     2400
A     2233
A     1100
B     2145
B     2200
B     2340
B     1050

and I wanted to write a loop in R to replace the low values in A and B with their respective means that excluded the bad readings so that the output would be:我想在 R 中编写一个循环,用它们各自的平均值替换 A 和 B 中的低值,排除错误读数,以便输出:

A     2350
A     2400
A     2233
A     2328
B     2145
B     2200
B     2340
B     2228

How would I go about doing that?我该怎么做? Below was my attempt and my issue comes from not being sure how to properly reference the Pitcher value in the specific row以下是我的尝试,我的问题来自不确定如何正确引用特定行中的 Pitcher 值

for (i in 1:nrow(data)){
  if (data$Pitch.Spin..rpm. < 1500)
  data$Pitch.Spin..rpm. <- mean(data$Pitch.Spin..rpm.[Pitcher == {i}],na.rm = TRUE)
}

We could do this with a group by operation.我们可以通过一组操作来做到这一点。 After grouping by 'Pitcher', mutate the 'Pitch.Spin..rpm.'按“Pitcher”分组后, mutate “Pitch.Spin..rpm”。 by replace ing the elements that are less than 1500 with the the mean of that column通过replace该列的mean replace小于 1500 的元素

library(dplyr)
data <- data %>%
   group_by(Pitcher) %>%
   mutate(`Pitch.Spin..rpm.` = replace(`Pitch.Spin..rpm.`, 
        `Pitch.Spin..rpm.` < 1500, mean(`Pitch.Spin..rpm.`, na.rm = TRUE)))

A base R solution, with ave .基本 R 解决方案,具有ave

ave(df$`Pitch.Spin..rpm.`, df$Pitcher, FUN = function(x){
  i <- x < 1500
  if(any(i)) x[i] <- mean(x[!i])
  x
})
#[1] 2350.000 2400.000 2233.000 2327.667 2145.000 2200.000 2340.000
#[8] 2228.333

Now assign this result back to the df's column.现在将此结果分配回 df 的列。

df$Pitch.Spin..rpm. <- ave(df$Pitch.Spin..rpm., df$Pitcher, FUN = function(x){
  i <- x < 1500
  if(any(i)) x[i] <- mean(x[!i])
  x
})

df
#  Pitcher Pitch.Spin..rpm.
#1       A         2350.000
#2       A         2400.000
#3       A         2233.000
#4       A         2327.667
#5       B         2145.000
#6       B         2200.000
#7       B         2340.000
#8       B         2228.333

An approach using dplyr and ifelse() to replace values is next:接下来是使用dplyrifelse()替换值的方法:

library(dplyr)

#Data
df <- structure(list(Pitcher = c("A", "A", "A", "A", "B", "B", "B", 
"B"), Pitch.Spin..rpm. = c(2350L, 2400L, 2233L, 1100L, 2145L, 
2200L, 2340L, 1050L)), class = "data.frame", row.names = c(NA, 
-8L))

Code:代码:

#Code
df %>% group_by(Pitcher) %>%
  mutate(NewVar=ifelse(Pitch.Spin..rpm.<1500,NA,Pitch.Spin..rpm.),
         Mean=mean(NewVar,na.rm=T),
         Pitch.Spin..rpm. = ifelse(is.na(NewVar),Mean,Pitch.Spin..rpm.)) %>%
  select(-c(NewVar,Mean))

Output:输出:

# A tibble: 8 x 2
# Groups:   Pitcher [2]
  Pitcher Pitch.Spin..rpm.
  <chr>              <dbl>
1 A                  2350 
2 A                  2400 
3 A                  2233 
4 A                  2328.
5 B                  2145 
6 B                  2200 
7 B                  2340 
8 B                  2228.

A way to do with loop is next but you have to save results in a list:接下来是一种处理循环的方法,但您必须将结果保存在列表中:

#Unique pitcher
val <- unique(df$Pitcher)
#Create empty list
List <- list()
#Loop
for(i in val)
{
  #Isolate data
  data1 <- subset(df,Pitcher==i)
  #Compute mean
  meanval <- mean(data1$Pitch.Spin..rpm.[!data1$Pitch.Spin..rpm.<1500])
  #Replace
  data1$Pitch.Spin..rpm.[data1$Pitch.Spin..rpm.<1500]<-meanval
  #Save in list
  List[[i]] <- data1
}
#Now bind the list
newdf <- do.call(rbind,List)
rownames(newdf) <- NULL

Output:输出:

  Pitcher Pitch.Spin..rpm.
1       A         2350.000
2       A         2400.000
3       A         2233.000
4       A         2327.667
5       B         2145.000
6       B         2200.000
7       B         2340.000
8       B         2228.333

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据R中另一个数据框中的列值过滤数据框? - how do i filter a dataframe based on the values of a column in another dataframe in R? 如何根据条件替换 R 中的数据帧值 - How do I replace dataframe values in R based on a condition 根据 R 中的另一列 dataframe 替换一列中的值 - Replace values in one column based on another dataframe in R 如何根据 R dataframe 中的列将 NA 值替换为不同的值? - How to replace NA values with different values based on column in R dataframe? 如何根据R中另一列中的值替换数据框的列中的值? - How to replace values in the columns of a dataframe based on the values in the other column in R? 如何基于R中另一列中的值替换列值? - How to replace column values based on values in another column in R? 根据另一个数据框中的列替换列值 - Replace column values based on column in another dataframe 如何通过 R 中的循环将 dataframe 列中的特定值替换为其他特定值 - How do I replace specific values in dataframe colum with other specific values through a loop in R 如何在按 r 中的类别排序的数据框中创建一列平均值? - How can I create a column of averages in a dataframe that is sorted by a category in r? R:如何用来自不同 dataframe 的另一列替换一列? - R: How do I replace a column with another column from a different dataframe?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM