![](/img/trans.png)
[英]Filter rows of one column on the condition that rows in all other columns are NA, and repeat for n columns
[英]Paste one column to all other columns if they are not NA
我的問題是我有一個5列的數據框 ,其中4列包含名稱,1列包含狀態。 例如
X1 X2 X3 X4 X5
1 name1 NA name3 NA active
2 name1 name2 NA name4 inactive
3 NA name2 name3 name4 unknown
4 name1 name2 NA NA inactive
5 name1 name2 name3 name4 unknown
我想做的是在每個X1,X2,X3和X4之間交替col X5
,並用下划線( name1_active
, name2_inactive
)將它們粘貼在一起,而不考慮NA
情況。
X1 X5 X2 X5 X3 X5 X4 X5
1 name1 active NA NA name3 active NA NA
2 name1 inactive name2 inactive NA NA name4 inactive
3 NA NA name2 unknown name3 unknown name4 unknown
4 name1 inactive name2 inactive NA NA NA NA
5 name1 unknown name2 unknown name3 unknown name4 unknown
輸出:
X1 X2 X3 X4
1 name1_active NA name3_active NA
2 name1_inactive name2_inactive NA name4_inactive
3 NA name2_unknown name3_unknown name4_unknown
4 name1_inactive name2_inactive NA NA
5 name1_unknown name2_unknown name3_unknown name4_unknown
嘗試:
d <- read.table(text = "X1 X2 X3 X4 X5
1 name1 NA name3 NA active
2 name1 name2 NA name4 inactive
3 NA name2 name3 name4 unknown
4 name1 name2 NA NA inactive
5 name1 name2 name3 name4 unknown", header = TRUE)
as.data.frame(lapply(d[, 1:4], function(x) ifelse(is.na(x), NA, paste(x, d$X5, sep = "_"))))
# X1 X2 X3 X4
#1 name1_active <NA> name3_active <NA>
#2 name1_inactive name2_inactive <NA> name4_inactive
#3 <NA> name2_unknown name3_unknown name4_unknown
#4 name1_inactive name2_inactive <NA> <NA>
#5 name1_unknown name2_unknown name3_unknown name4_unknown
這類似於r.user.05apr的答案,但我想表明我們可以使用lapply
直接遍歷並替換原始數據框中的列。
dat[, paste0("X", 1:4)] <- lapply(dat[, paste0("X", 1:4)], function(x){
ifelse(!is.na(x), paste(x, dat$X5, sep = "_"), x)
})
dat$X5 <- NULL
dat
# X1 X2 X3 X4
# 1 name1_active <NA> name3_active <NA>
# 2 name1_inactive name2_inactive <NA> name4_inactive
# 3 <NA> name2_unknown name3_unknown name4_unknown
# 4 name1_inactive name2_inactive <NA> <NA>
# 5 name1_unknown name2_unknown name3_unknown name4_unknown
我們也可以用mutate_at
從dplyr
包。
library(dplyr)
dat2 <- dat %>%
mutate_at(vars(-X5), funs(ifelse(!is.na(.), paste(., X5, sep = "_"), .))) %>%
select(-X5)
dat2
# X1 X2 X3 X4
# 1 name1_active <NA> name3_active <NA>
# 2 name1_inactive name2_inactive <NA> name4_inactive
# 3 <NA> name2_unknown name3_unknown name4_unknown
# 4 name1_inactive name2_inactive <NA> <NA>
# 5 name1_unknown name2_unknown name3_unknown name4_unknown
數據
dat <- read.table(text = " X1 X2 X3 X4 X5
1 name1 NA name3 NA active
2 name1 name2 NA name4 inactive
3 NA name2 name3 name4 unknown
4 name1 name2 NA NA inactive
5 name1 name2 name3 name4 unknown",
header = TRUE, stringsAsFactors = FALSE)
我會在鍋里扔一個purrr
+ stringr
-solution;)
library(purrr)
library(stringr)
map_df(my_data[, 1:4], ~ str_c(.x, "_", my_data$X5))
# A tibble: 5 x 4
# X1 X2 X3 X4
# <chr> <chr> <chr> <chr>
# 1 name1_active NA name3_active NA
# 2 name1_inactive name2_inactive NA name4_inactive
# 3 NA name2_unknown name3_unknown name4_unknown
# 4 name1_inactive name2_inactive NA NA
# 5 name1_unknown name2_unknown name3_unknown name4_unknown
map_df
自動返回tibble
與str_c
缺失值是“傳染性”。
我的解決方案使用apply
df <- data.frame(A = c('a1', 'a2', 'a3'))
df$B <- c('b1', 'b2', 'b3')
df$C <- c('c1', 'c2', 'c3')
df$STATUS <- c('OK', 'BAD', 'OK')
df1 <- apply(df[,1:(ncol(df)-1)], 2, function(X) {
X <- paste0(X, "_", df$STATUS)
})
df1
結果如下:
A B C
[1,] "a1_OK" "b1_OK" "c1_OK"
[2,] "a2_BAD" "b2_BAD" "c2_BAD"
[3,] "a3_OK" "b3_OK" "c3_OK"
使用dplyr並且如果我以蠻力的方式理解正確的方式:(我假設你想用NA_something和something_NA而不是NA_NA來保存案例)
df2 <- df %>%
mutate(X1 = paste(X1,X5,sep="_")) %>%
mutate(X1 = ifelse(X1 %in% c("NA_NA"),NA,X1)) %>%
mutate(X2 = paste(X2,X5,sep="_")) %>%
mutate(X2 = ifelse(X2 %in% c("NA_NA"),NA,X2)) %>%
mutate(X3 = paste(X3,X5,sep="_")) %>%
mutate(X3 = ifelse(X3 %in% c("NA_NA"),NA,X3) %>%
mutate(X4 = paste(X4,X5,sep="_")) %>%
mutate(X4 = ifelse(X4 %in% c("NA_NA"),NA,X4)) %>%
select(-X5)
如果分配給d[]
而不是d
則可以直接使用lapply
:
d[] <- lapply(d, function(x) ifelse(is.na(x), NA, paste(x,d$X5, sep="_")))
# or, excluding the 5th col
d[,-5] <- lapply(d[,-5], function(x) ifelse(is.na(x), NA, paste(x,d$X5, sep="_")))
或者,如果您不想覆蓋d
的值,您可以使用看上去很漂亮的"[<-"
方法:
"[<-"(d,,-5, lapply(d[,-5], function(.) ifelse(is.na(.), NA, paste(., d$X5, sep="_"))))
# notice two commas with nothing in between - not a typo
最后,一個environment()
友好的解決方案:
within(d,
list2env(
lapply(d,
function(x) ifelse(is.na(x), NA, paste(x,X5, sep="_"))),
environment()))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.