[英]Mutate entries in a column in dplyr
我在 R 中有以下數據框。有沒有一種方法可以清理此列以使所有“y”或“yes”條目顯示為“是”(類似地,所有“nop”條目顯示為“否” ) 在 dplyr 中?
structure(list(has_elevator = c("Yes", "y", "y", "yes", "y",
"Yes", "yes", "y", "Yes", "yes", "yes", "Yes", "Yes", "y", "Yes",
"No", "Yes", "No", "y", "nop", "Yes", "yes", "Yes", "No", "Yes",
"y", "Yes", "yes", "nop", "yes", "Yes", "nop", "yes", "Yes",
"y", "y", "Yes", "no", "y", "Yes", "nop", "y", "y", "y", "No",
"no", "y", "y", "Yes", "no")), class = "data.frame", row.names = c(NA,
-50L))
這是另一種方法:我們可以使用str_detect
及其參數ignore_case = T
包裝在ifelse
語句中。
library(dplyr)
library(stringr)
df %>%
mutate(has_elevator = ifelse(str_detect(has_elevator, regex('y', ignore_case = T)), "Yes", "No"))
has_elevator
1 Yes
2 Yes
3 Yes
4 Yes
5 Yes
6 Yes
7 Yes
8 Yes
9 Yes
10 Yes
11 Yes
12 Yes
13 Yes
14 Yes
15 Yes
16 No
17 Yes
18 No
19 Yes
20 No
21 Yes
22 Yes
23 Yes
24 No
25 Yes
26 Yes
27 Yes
28 Yes
29 No
30 Yes
31 Yes
32 No
33 Yes
34 Yes
35 Yes
36 Yes
37 Yes
38 No
39 Yes
40 Yes
41 No
42 Yes
43 Yes
44 Yes
45 No
46 No
47 Yes
48 Yes
49 Yes
50 No
您可以在mutate()
中使用case_when()
) 來重新編碼您的變量。 因為我還發現你有一些值no
而不是No
,所以我也為你重新編碼了這些值。
# Your example data
df <- structure(list(has_elevator = c("Yes", "y", "y", "yes", "y",
"Yes", "yes", "y", "Yes", "yes", "yes", "Yes", "Yes", "y", "Yes",
"No", "Yes", "No", "y", "nop", "Yes", "yes", "Yes", "No", "Yes",
"y", "Yes", "yes", "nop", "yes", "Yes", "nop", "yes", "Yes",
"y", "y", "Yes", "no", "y", "Yes", "nop", "y", "y", "y", "No",
"no", "y", "y", "Yes", "no")), class = "data.frame", row.names = c(NA,
-50L))
case_when()
library(dplyr)
# Using case_when()
df_new <- df %>% mutate(
has_elevator = case_when(
has_elevator %in% c("y", "yes") ~ "Yes",
has_elevator %in% c("nop", "no") ~ "No",
TRUE ~ has_elevator
)
)
df_new$has_elevator %>% table()
#> .
#> No Yes
#> 11 39
recode()
library(dplyr)
df_new <- df %>% mutate(
has_elevator = recode(has_elevator, y = "Yes", yes = "Yes", nop = "No", no = "No")
)
df_new$has_elevator %>% table()
#> .
#> No Yes
#> 11 39
您可以使用正則表達式跳過將值重新編碼為正確的大小寫,該正則表達式會自動將字符串的第一個字母大寫,無論它是什么。 這避免了對值大小寫的可能疏忽。
這也是一種不需要stringr
包的base
方法。
df_new <- df %>% mutate(
has_elevator = case_when(
has_elevator %in% c("y") ~ "Yes",
has_elevator %in% c("no") ~ "No",
TRUE ~ has_elevator),
has_elevator = has_elevator %>% sub('^(\\w?)', '\\U\\1', ., perl=T)
)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.