[英]Replacing a value with user defined function in grouped data
我遇到一個問題,如果滿足條件,則用另一個替換一個值。 我使用自己的函數data_manip
,可以在需要時分配或添加任何其他條件。
但是,當我嘗試使用此data_manip
函數時,它將使用分配的值更改該組內的所有值。 但是該特定組中的其他值不滿足此條件。
這是我嘗試過的
df <- data.frame(percent = c(0.6, 0.7,1, 0.5,0.5,1,0.4,0.6,1),
type = rep(c("good", "bad","ugly"),each=3), smoke=rep(c('Visky','Wine','Wine'),3),
sex=rep(c('male','male','female'),3))
> df
percent type smoke sex
1 0.6 good Visky male
2 0.7 good Wine male
3 1.0 good Wine female
4 0.5 bad Visky male
5 0.5 bad Wine male
6 1.0 bad Wine female
7 0.4 ugly Visky male
8 0.6 ugly Wine male
9 1.0 ugly Wine female
data_manip <- function(x,gr){
if(grepl('goo|ug',gr)&&x<1){
x[x==0.6] <- 1
}
else
x
}
df%>%
group_by(type)%>%
mutate(percent_new=data_manip(percent,type))
給
# A tibble: 9 x 5
# Groups: type [3]
percent type smoke sex percent_new
<dbl> <fctr> <fctr> <fctr> <dbl>
1 0.6 good Visky male 1.0
2 0.7 good Wine male 1.0
3 1.0 good Wine female 1.0
4 0.5 bad Visky male 0.5
5 0.5 bad Wine male 0.5
6 1.0 bad Wine female 1.0
7 0.4 ugly Visky male 1.0
8 0.6 ugly Wine male 1.0
9 1.0 ugly Wine female 1.0
如果條件不適合它們,我想保留原始percent
值。
預期產量
# A tibble: 9 x 5
# Groups: type [3]
percent type smoke sex percent_new
<dbl> <fctr> <fctr> <fctr> <dbl>
1 0.6 good Visky male 1.0
2 0.7 good Wine male 0.7
3 1.0 good Wine female 1.0
4 0.5 bad Visky male 0.5
5 0.5 bad Wine male 0.5
6 1.0 bad Wine female 1.0
7 0.4 ugly Visky male 0.4
8 0.6 ugly Wine male 1.0
9 1.0 ugly Wine female 1.0
您當前的data_manip
函數似乎沒有被向量化,因為它使用了if (cond) { ... } else { ... }
,它通常只檢查一個值,並且可能默認為向量的第一個元素。 函數的矢量化版本如下所示:
data_manip <- function(x,gr){
ifelse(grepl('goo|ug', gr) & x == 0.6, 1, x)
}
並給出預期的結果:
> df%>%
+ group_by(type)%>%
+ mutate(percent_new=data_manip(percent,type))
# A tibble: 9 x 5
# Groups: type [3]
percent type smoke sex percent_new
<dbl> <fctr> <fctr> <fctr> <dbl>
1 0.6 good Visky male 1.0
2 0.7 good Wine male 0.7
3 1.0 good Wine female 1.0
4 0.5 bad Visky male 0.5
5 0.5 bad Wine male 0.5
6 1.0 bad Wine female 1.0
7 0.4 ugly Visky male 0.4
8 0.6 ugly Wine male 1.0
9 1.0 ugly Wine female 1.0
使用ifelse
進行矢量化條件檢查。
這似乎是case_when
有用的問題。
嘗試這個:
library(tidyverse)
df %>%
mutate(new_percentage = case_when(type == "good" & percent == 0.6 ~ 1,
type == "ugly" & percent == 0.6 ~ 1,
TRUE ~ as.double(.$percent)))
這使:
# A tibble: 9 x 5
percent type smoke sex new_percentage
<dbl> <fctr> <fctr> <fctr> <dbl>
1 0.6 good Visky male 1.0
2 0.7 good Wine male 0.7
3 1.0 good Wine female 1.0
4 0.5 bad Visky male 0.5
5 0.5 bad Wine male 0.5
6 1.0 bad Wine female 1.0
7 0.4 ugly Visky male 0.4
8 0.6 ugly Wine male 1.0
9 1.0 ugly Wine female 1.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.