簡體   English   中英

如何將 R 中的所有因子水平折疊到新水平

[英]how to collapse all factor levels into new levels in R

我想創建一個新變量來對其中另一個因素的水平進行分組。

我試過了

neigh_agg <- function(df) {
  df %>%
    mutate(
    Neighborhood2 = as.factor(
      ifelse(as.character(Neighborhood) == "Blmngtn", "Neigh_1",
  ifelse(as.character(Neighborhood) == "Blueste", "Neigh_1",
  ifelse(as.character(Neighborhood) == "ClearCr", "Neigh_1", 
  ifelse(as.character(Neighborhood) == "CollgCr", "Neigh_1", 
  ifelse(as.character(Neighborhood) == "Crawfor", "Neigh_1", 
  ifelse(as.character(Neighborhood) == "Gilbert", "Neigh_1", 
  ifelse(as.character(Neighborhood) == "Greens", "Neigh_1", 
  ifelse(as.character(Neighborhood) == "GrnHill", "Neigh_1", 
  ifelse(as.character(Neighborhood) == "NPkVill", "Neigh_1", 
  ifelse(as.character(Neighborhood) == "NWAmes", "Neigh_1", 
  ifelse(as.character(Neighborhood) == "SawyerW", "Neigh_1", 
  ifelse(as.character(Neighborhood) == "Veenker", "Neigh_1", 
  ifelse(as.character(Neighborhood) == "BrDale", "Neigh_2", 
  ifelse(as.character(Neighborhood) == "BrkSide", "Neigh_2", 
  ifelse(as.character(Neighborhood) == "Edwards", "Neigh_2", 
  ifelse(as.character(Neighborhood) == "IDOTRR", "Neigh_2", 
  ifelse(as.character(Neighborhood) == "MeadowV", "Neigh_2", 
  ifelse(as.character(Neighborhood) == "OldTown", "Neigh_2", 
  ifelse(as.character(Neighborhood) == "Sawyer", "Neigh_2", 
  ifelse(as.character(Neighborhood) == "SWISU", "Neigh_2", 
  ifelse(as.character(Neighborhood) == "NAmes", "Neigh_3", 
  ifelse(as.character(Neighborhood) == "Mitchel", "Neigh_3", 
  ifelse(as.character(Neighborhood) == "StoneBr", "Neigh_4", 
  ifelse(as.character(Neighborhood) == "NoRidge", "Neigh_4", 
  ifelse(as.character(Neighborhood) == "NridgHt", "Neigh_4", 
  ifelse(as.character(Neighborhood) == "Somerst", "Neigh_5", 
  ifelse(as.character(Neighborhood) == "Timber", "Neigh_5", 
         "Neigh_5"))))))))))))))))))))))))))))
    )
}

有沒有更快更簡單的方法?

我使用的數據可以在這里找到:

https://d3c33hcgiwev3.cloudfront.net/_fc6ea3b3b1af3f4fd9afb752e85d4299_ames_train.Rdata?Expires=1633651200&Signature=P7oxFR0IzJ2UP73GI0aJVua67DxUlvoWYhXdQwHf2CZefX2J~0KAxosAWMHtHxcKH81l87~uRBS0FqBb2MUA2UCQUWCg3ldR9mBQypVTq4ofv3wwOq3-r7d6hw1zM72FYfX2oRYgsKzTl5ucb9oQVUa~jBOW1tF3sTtL0h-ykr4_&Key-Pair-Id=APKAJLTNE6QMUY6HBC5A

一種更簡單的方法是創建一個鍵/值數據集並進行連接

library(dplyr)
keydat <- tibble(Neighborhood = c("Blmngtn", "Blueste"), 
           Neighborhood2 = c("Neigh_1", "Neigh_1"))
df %>%
   left_join(keydat, by = "Neighborhood")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM