[英]How to replace factor levels in multiples columns of a data frame based on the match lookup data frame using R
df1 中與數據幀 lookup_df 中的 lab_pt 匹配的級別我想用 lookup_df 的第二列中的相應級別替換(這里是:lab_en)。 但我想保留其余部分。 非常感謝!
--
主數據框
df1 <- data.frame(
num_var = sample(200, 15),
col1 = rep(c("onda","estrela","rato","caneta","ceu"), 3),
col2 = rep(c("muro","gato","pa","rato","ceu"), 3),
col3 = rep(c("surf","onda","dente","onda","sei"), 3),
col3 = rep(c("onda","casa",NA,"nao","net"), 3))
查找數據框
lookup_df <- data.frame(
lab_pt = c("onda","estrela","rato","caneta","ceu"),
lab_en = c("wave","star","rat","pen","sky"))
我在下面嘗試過這個。 它完成了這項工作,但不匹配的信息被轉換為 NA,這是我不想要的。
rownames(lookup_df) <- lookup_df$lab_pt
apply(df1[,2:ncol(df1)], 2, function(x) lookup_df[as.character(x),]$lab_en)
這里的這篇文章非常相似,但在這種情況下,所有級別都是可匹配的,與此處不同。 非常感謝! 根據查找表替換數據框中的值
我認為這應該用data.table
包來完成。 它確實重新排序了 id,這是一個問題嗎?
# added seed
# changed col3 to col4
set.seed(1)
df1 <- data.frame(
num_var = sample(200, 15),
col1 = rep(c("onda","estrela","rato","caneta","ceu"), 3),
col2 = rep(c("muro","gato","pa","rato","ceu"), 3),
col3 = rep(c("surf","onda","dente","onda","sei"), 3),
col4 = rep(c("onda","casa",NA,"nao","net"), 3))
lookup_df <- data.frame(
lab_pt = c("onda","estrela","rato","caneta","ceu"),
lab_en = c("wave","star","rat","pen","sky"))
# data.table solution
library(data.table)
# change from wide to long, to make merge easier
dt <- melt(as.data.table(df1), id.vars="num_var")
# merge in the new values to original data
dt2 <- merge(dt, lookup_df, by.x="value", by.y="lab_pt",
all.x=TRUE)
# if its missing, replace with original value
dt2[is.na(lab_en), lab_en := value]
# convert back from long to wide
dt3 <- dcast(dt2[, .(num_var, variable, lab_en)], num_var~variable,
value.var="lab_en")
# back to data.frame
output <- as.data.frame(dt3)
每當您在表之間進行合並時,處理長格式數據通常會更好,其中有一個組列和一個值列。 這意味着您不需要多次運行相同的操作(合並)。
我認為這可能會幫助你它雖然會創建一個新列但會完成工作
df1$new <- lookup_df[match(df1$col1, lookup_df$lab_pt),2]
您可以執行以下操作:
lookup_vec <- setNames(as.character(lookup_df[["lab_en"]]), lookup_df[["lab_pt"]])
# onda estrela rato caneta ceu
# "wave" "star" "rat" "pen" "sky"
factors_vars <- names(df1)[sapply(df1, is.factor)]
for (var in factors_vars) {
w <- which(levels(df1[[var]]) %in% names(lookup_vec)) # Get only those that are "matchable"
levels(df1[[var]])[w] <- lookup_vec[levels(df1[[var]])[w]]
}
df1
num_var col1 col2 col3 col3.1
1 21 wave muro surf wave
2 104 star gato wave casa
3 60 rat pa dente <NA>
4 183 pen rat wave nao
5 123 sky sky sei net
6 17 wave muro surf wave
7 34 star gato wave casa
8 126 rat pa dente <NA>
9 139 pen rat wave nao
10 35 sky sky sei net
11 149 wave muro surf wave
12 8 star gato wave casa
13 46 rat pa dente <NA>
14 32 pen rat wave nao
15 162 sky sky sei net
這是使用dplyr
包的解決方案。 請注意參數stringAsFactor=F
將單詞保留為字符串。
df1 <- data.frame(
num_var = sample(200, 15),
col1 = rep(c("onda","estrela","rato","caneta","ceu"), 3),
col2 = rep(c("muro","gato","pa","rato","ceu"), 3),
col3 = rep(c("surf","onda","dente","onda","sei"), 3),
col3 = rep(c("onda","casa",NA,"nao","net"), 3), stringsAsFactors = F)
lookup_df <- data.frame(
lab_pt = c("onda","estrela","rato","caneta","ceu"),
lab_en = c("wave","star","rat","pen","sky"), stringsAsFactors = F)
library(dplyr)
df1 %>% mutate(col1=replace(col1, col1 %in% lookup_df$lab_pt, lookup_df$lab_en)) %>%
mutate(col2=replace(col2, col2 %in% lookup_df$lab_pt, lookup_df$lab_en)) %>%
mutate(col3=replace(col3, col3 %in% lookup_df$lab_pt, lookup_df$lab_en)) %>%
mutate(col3.1=replace(col3.1, col3.1 %in% lookup_df$lab_pt, lookup_df$lab_en))
我承認為數據幀的每一列使用一行有點乏味。 無法找到同時為所有列執行此操作的方法。
num_var col1 col2 col3 col3.1
1 6 wave muro surf wave
2 84 star gato wave casa
3 146 rat pa dente <NA>
4 133 pen wave star nao
5 47 sky star sei net
6 116 wave muro surf star
7 81 star gato rat casa
8 118 rat pa dente <NA>
9 186 pen rat pen nao
10 161 sky pen sei net
11 135 wave muro surf rat
12 31 star gato sky casa
13 174 rat pa dente <NA>
14 187 pen sky wave nao
15 178 sky wave sei net
# Fake dataframe
df1 <- tibble(
num_var = sample(200, 15),
col1 = rep(c("onda","estrela","rato","caneta","ceu"), 3),
col2 = rep(c("muro","gato","pa","rato","ceu"), 3),
col3 = rep(c("surf","onda","dente","onda","sei"), 3),
col4 = rep(c("onda","casa",NA,"nao","net"), 3))
# Lookup dictionary dataframe
lookup_dat <- tibble(
lab_pt = c("onda","estrela","rato","caneta","ceu"),
lab_en = c("wave","star","rat","pen","sky"))
#******************************************************************
#
# Translation by replacement of lookup dictionary
# Developed to generate Rmd report with labels of plots in different languages
replace_level <- function(df, lookup_df, col_langu_in, col_langu_out){
library(data.table)
# function to replace levels in the df given a reference list in
# another df when level match it replace with the correspondent
#level in the same row name but in other column.
# !!!! Variables col_langu need to be quoted
# 1) Below it creates a dictionary style with the reference df (2cols)
lookup_vec <- setNames(as.character(lookup_df[[col_langu_out]]),
lookup_df[[col_langu_in]])
# 2) iterating over main df col names
for (i in names(df)) { # select cols?: names(df)[sapply(df, is.factor)]
# 3) return index of levels from df levels matching with those from
# the dictionary type to replace (for each cols of df i)
if(is.character(df[[i]])){df[i] <- as.factor(df[[i]])}
# Changing from character to factor before the translation
index_match <- which(levels(df[[i]]) %in%
names(lookup_vec))
# 4) replacing matchable levels based on the index on step 3).
# with the reference to translate
levels(df[[i]])[index_match] <-
lookup_vec[levels(df[[i]])[index_match]]}
return(df)}
# test here
replace_level(df1, lookup_dat, "lab_pt", "lab_en")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.