簡體   English   中英

改變 dplyr 中的列中的條目

[英]Mutate entries in a column in dplyr

我在 R 中有以下數據框。有沒有一種方法可以清理此列以使所有“y”或“yes”條目顯示為“是”(類似地,所有“nop”條目顯示為“否” ) 在 dplyr 中?

structure(list(has_elevator = c("Yes", "y", "y", "yes", "y", 
"Yes", "yes", "y", "Yes", "yes", "yes", "Yes", "Yes", "y", "Yes", 
"No", "Yes", "No", "y", "nop", "Yes", "yes", "Yes", "No", "Yes", 
"y", "Yes", "yes", "nop", "yes", "Yes", "nop", "yes", "Yes", 
"y", "y", "Yes", "no", "y", "Yes", "nop", "y", "y", "y", "No", 
"no", "y", "y", "Yes", "no")), class = "data.frame", row.names = c(NA, 
-50L))

這是另一種方法:我們可以使用str_detect及其參數ignore_case = T包裝在ifelse語句中。

library(dplyr)
library(stringr)

df %>% 
  mutate(has_elevator  = ifelse(str_detect(has_elevator,  regex('y', ignore_case = T)), "Yes", "No"))
 has_elevator
1           Yes
2           Yes
3           Yes
4           Yes
5           Yes
6           Yes
7           Yes
8           Yes
9           Yes
10          Yes
11          Yes
12          Yes
13          Yes
14          Yes
15          Yes
16           No
17          Yes
18           No
19          Yes
20           No
21          Yes
22          Yes
23          Yes
24           No
25          Yes
26          Yes
27          Yes
28          Yes
29           No
30          Yes
31          Yes
32           No
33          Yes
34          Yes
35          Yes
36          Yes
37          Yes
38           No
39          Yes
40          Yes
41           No
42          Yes
43          Yes
44          Yes
45           No
46           No
47          Yes
48          Yes
49          Yes
50           No

您可以在mutate()中使用case_when() ) 來重新編碼您的變量。 因為我還發現你有一些值no而不是No ,所以我也為你重新編碼了這些值。

# Your example data
df <- structure(list(has_elevator = c("Yes", "y", "y", "yes", "y", 
                                "Yes", "yes", "y", "Yes", "yes", "yes", "Yes", "Yes", "y", "Yes", 
                                "No", "Yes", "No", "y", "nop", "Yes", "yes", "Yes", "No", "Yes", 
                                "y", "Yes", "yes", "nop", "yes", "Yes", "nop", "yes", "Yes", 
                                "y", "y", "Yes", "no", "y", "Yes", "nop", "y", "y", "y", "No", 
                                "no", "y", "y", "Yes", "no")), class = "data.frame", row.names = c(NA, 
                                                                                                   -50L))

使用case_when()

library(dplyr)

# Using case_when()
df_new <- df %>% mutate(
  has_elevator = case_when(
    has_elevator %in% c("y", "yes") ~ "Yes",
    has_elevator %in% c("nop", "no") ~ "No",
    TRUE ~ has_elevator
  )
)

df_new$has_elevator %>% table()
#> .
#>  No Yes 
#>  11  39

使用recode()

library(dplyr)

df_new <- df %>% mutate(
  has_elevator = recode(has_elevator, y = "Yes", yes = "Yes", nop = "No", no = "No")
)

df_new$has_elevator %>% table()
#> .
#>  No Yes 
#>  11  39

將字符串替換與任一函數相結合

您可以使用正則表達式跳過將值重新編碼為正確的大小寫,該正則表達式會自動將字符串的第一個字母大寫,無論它是什么。 這避免了對值大小寫的可能疏忽。

這也是一種不需要stringr包的base方法。

df_new <- df %>% mutate(
  has_elevator = case_when(
    has_elevator %in% c("y") ~ "Yes",
    has_elevator %in% c("no") ~ "No",
    TRUE ~ has_elevator),
  has_elevator = has_elevator %>% sub('^(\\w?)', '\\U\\1', ., perl=T)
)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM