[英]R - re-order columns based on match (template)
所以我有一個大型數據集,如下所示:
V1 V2 V3 V4
1 Sleep Domestic Eat Child Care
2 Sleep Domestic Eat Paid
3 Sleep Domestic Eat Child Care
4 Sleep Eat Paid <NA>
我想做的是根據“模板” reorder
列
["Sleep", "Eat", "Domestic", "Paid", "Child care"]
獲得(輸出)
V1 V2 V3 V4 V5
Sleep Eat Domestic NA Child Care
Sleep Eat Domestic Paid NA
Sleep Eat Domestic NA Child Care
Sleep Eat NA Paid NA
所以在第1列Sleep
,第2列Eat
,......
我不知道從哪里開始。 任何的想法 ?
數據
x = structure(list(V1 = c("Sleep", "Sleep", "Sleep", "Sleep"), V2 = c("Domestic",
"Domestic", "Domestic", "Eat"), V3 = c("Eat", "Eat", "Eat", "Paid"
), V4 = c("Child Care", "Paid", "Child Care", NA)), .Names = c("V1",
"V2", "V3", "V4"), row.names = c(NA, 4L), class = "data.frame")
template = c('Sleep', 'Eat', 'Domestic', 'Paid', 'Child care')
檢查rowSums
中的每個template
值,然后再將它們拼湊在一起:
template <- c("Sleep", "Eat", "Domestic", "Paid", "Child Care")
# i've fixed this template so the case matches the values for 'Child Care'
data.frame(lapply(
setNames(template, seq_along(template)),
function(v) c(NA,v)[(rowSums(x==v,na.rm=TRUE)>0)+1]
))
# X1 X2 X3 X4 X5
#1 Sleep Eat Domestic <NA> Child Care
#2 Sleep Eat Domestic Paid <NA>
#3 Sleep Eat Domestic <NA> Child Care
#4 Sleep Eat <NA> Paid <NA>
或者使用pmax
的替代方案:
data.frame(
lapply(
setNames(template, seq_along(template)),
function(v) do.call(pmax, c(replace(x, x != v,NA),na.rm=TRUE))
)
)
reshape2和dplyr解決方案。 顯然不像其他人那么緊湊。 想法是融化(變高),順序因素和演員。
library(reshape2)
library(dplyr)
# make and id column
x$id <- row.names(x)
# make a tall result id, var, value
tall <- x %>%
melt(id.vars="id") %>%
select(id, value)
# make an ordered factor with the template
tall$value <- factor(tall$value, levels=template, ordered = TRUE)
# make wide result with dcast
result <- tall %>%
filter(!is.na(value)) %>% # drop the NAs
mutate(var = value) %>% # name the column the same as the value
dcast(id ~ var) # make into wide format
result
# id Sleep Eat Domestic Paid Child Care
#1 1 Sleep Eat Domestic <NA> Child Care
#2 2 Sleep Eat Domestic Paid <NA>
#3 3 Sleep Eat Domestic <NA> Child Care
#4 4 Sleep Eat <NA> Paid <NA>
這是tidyverse
一個選項
library(dplyr)
library(tidyr)
library(tibble)
rownames_to_column(x, 'id') %>%
gather(Var, Val, -id, na.rm = TRUE) %>%
mutate(Var = factor(Val, levels = template)) %>%
spread(Var, Val) %>%
select(-id) %>%
setNames(., paste0("V", seq_along(template)))
# V1 V2 V3 V4 V5
#1 Sleep Eat Domestic <NA> Child Care
#2 Sleep Eat Domestic Paid <NA>
#3 Sleep Eat Domestic <NA> Child Care
#4 Sleep Eat <NA> Paid <NA>
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.