簡體   English   中英

用用戶定義的函數替換分組數據中的值

[英]Replacing a value with user defined function in grouped data

我遇到一個問題,如果滿足條件,則用另一個替換一個值。 我使用自己的函數data_manip ,可以在需要時分配或添加任何其他條件。

但是,當我嘗試使用此data_manip函數時,它將使用分配的值更改該組內的所有值。 但是該特定組中的其他值不滿足此條件。

這是我嘗試過的

df <- data.frame(percent = c(0.6, 0.7,1, 0.5,0.5,1,0.4,0.6,1), 
                 type = rep(c("good", "bad","ugly"),each=3), smoke=rep(c('Visky','Wine','Wine'),3),
                 sex=rep(c('male','male','female'),3))

> df
  percent type smoke    sex
1     0.6 good Visky   male
2     0.7 good  Wine   male
3     1.0 good  Wine female
4     0.5  bad Visky   male
5     0.5  bad  Wine   male
6     1.0  bad  Wine female
7     0.4 ugly Visky   male
8     0.6 ugly  Wine   male
9     1.0 ugly  Wine female


data_manip <- function(x,gr){
  if(grepl('goo|ug',gr)&&x<1){
    x[x==0.6] <- 1
  }
    else
  x
}

df%>%
  group_by(type)%>%
  mutate(percent_new=data_manip(percent,type))

# A tibble: 9 x 5
# Groups:   type [3]
  percent   type  smoke    sex percent_new
    <dbl> <fctr> <fctr> <fctr>       <dbl>
1     0.6   good  Visky   male         1.0
2     0.7   good   Wine   male         1.0
3     1.0   good   Wine female         1.0
4     0.5    bad  Visky   male         0.5
5     0.5    bad   Wine   male         0.5
6     1.0    bad   Wine female         1.0
7     0.4   ugly  Visky   male         1.0
8     0.6   ugly   Wine   male         1.0
9     1.0   ugly   Wine female         1.0

如果條件不適合它們,我想保留原始percent值。

預期產量

 # A tibble: 9 x 5
    # Groups:   type [3]
      percent   type  smoke    sex percent_new
        <dbl> <fctr> <fctr> <fctr>       <dbl>
    1     0.6   good  Visky   male         1.0
    2     0.7   good   Wine   male         0.7
    3     1.0   good   Wine female         1.0
    4     0.5    bad  Visky   male         0.5
    5     0.5    bad   Wine   male         0.5
    6     1.0    bad   Wine female         1.0
    7     0.4   ugly  Visky   male         0.4
    8     0.6   ugly   Wine   male         1.0
    9     1.0   ugly   Wine female         1.0

您當前的data_manip函數似乎沒有被向量化,因為它使用了if (cond) { ... } else { ... } ,它通常只檢查一個值,並且可能默認為向量的第一個元素。 函數的矢量化版本如下所示:

data_manip <- function(x,gr){
    ifelse(grepl('goo|ug', gr) & x == 0.6, 1, x)
}

並給出預期的結果:

> df%>%
+     group_by(type)%>%
+     mutate(percent_new=data_manip(percent,type))
# A tibble: 9 x 5
# Groups:   type [3]
  percent   type  smoke    sex percent_new
    <dbl> <fctr> <fctr> <fctr>       <dbl>
1     0.6   good  Visky   male         1.0
2     0.7   good   Wine   male         0.7
3     1.0   good   Wine female         1.0
4     0.5    bad  Visky   male         0.5
5     0.5    bad   Wine   male         0.5
6     1.0    bad   Wine female         1.0
7     0.4   ugly  Visky   male         0.4
8     0.6   ugly   Wine   male         1.0
9     1.0   ugly   Wine female         1.0

使用ifelse進行矢量化條件檢查。

這似乎是case_when有用的問題。

嘗試這個:

library(tidyverse)

df %>% 
  mutate(new_percentage = case_when(type == "good" & percent == 0.6 ~ 1,
                                    type == "ugly" & percent == 0.6 ~ 1,
                                    TRUE ~ as.double(.$percent)))

這使:

# A tibble: 9 x 5
  percent   type  smoke    sex new_percentage
    <dbl> <fctr> <fctr> <fctr>          <dbl>
1     0.6   good  Visky   male            1.0
2     0.7   good   Wine   male            0.7
3     1.0   good   Wine female            1.0
4     0.5    bad  Visky   male            0.5
5     0.5    bad   Wine   male            0.5
6     1.0    bad   Wine female            1.0
7     0.4   ugly  Visky   male            0.4
8     0.6   ugly   Wine   male            1.0
9     1.0   ugly   Wine female            1.0

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM