簡體   English   中英

更正列中文本類型數據的最有效方法是什么?

[英]What is the most efficient way to correct text type data in a column?

fito <- c("forest", "savaaaana", "brae soil", "bare soil", "savanna", "froest")
id <- 1:6

df <- data.frame(fito = as.factor(fito), id = id)

用正確的數據("savanna", "bare soil", "forest")替換錯誤類型的數據("savaaaana", "brae soil", "froest") )的最聰明方法是什么?

一開始我有六個因素。 正確的是只有三個。

如何使用 tidyverse package 做到這一點?

你可以試試:

df2 <- df %>% mutate(fito = fct_collapse(fito, savanna = c("savaaaana", "savanna"),
                                 `bare soil` = c("brae soil","bare soil"),
                                 forest = c("forest","froest" )))
 
str(df2)
'data.frame':   6 obs. of  2 variables:
 $ fito: Factor w/ 3 levels "bare soil","forest",..: 2 3 1 1 3 2
 $ id  : int  1 2 3 4 5 6

有兩種方法可以做到這一點:

library(tidyverse)

old<- c("savaaaana", "brae soil", "froest") 
new<- c("savanna", "bare soil", "forest")
df %>%
   mutate(fito=factor(str_replace_all(fito, set_names(new, old))))
 
      fito id
1    forest  1
2   savanna  2
3 bare soil  3
4 bare soil  4
5   savanna  5
6    forest  6

df %>%
 mutate(fito = lift(fct_recode)(as.list(set_names(old, new)), fito))

       fito id
1    forest  1
2   savanna  2
3 bare soil  3
4 bare soil  4
5   savanna  5
6    forest  6

df %>%
  mutate(fito = invoke(fct_recode, c(list(fito),as.list(set_names(old, new)))))
       fito id
1    forest  1
2   savanna  2
3 bare soil  3
4 bare soil  4
5   savanna  5
6    forest  6

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM