I would like to have a column that contains other columns characters without NA. I have tried paste
, str_c
and unite
, but could not get the expected result. Maybe I used them incorrectly.
The real case is, I could not know the column numbers in advance since each dataset can be varied in terms of years.
ie some datasets contain 10 years, but some contain 20 years.
Here is the input data:
input <- tibble(
id = c('aa', 'ss', 'dd', 'qq'),
'2017' = c('tv', NA, NA, 'web'),
'2018' = c(NA, 'web', NA, NA),
'2019' = c(NA, NA, 'book', 'tv')
)
# A tibble: 4 x 4
id `2017` `2018` `2019`
<chr> <chr> <chr> <chr>
1 aa tv NA NA
2 ss NA web NA
3 dd NA NA book
4 qq web NA tv
The desired output with the ALL column is:
> output
# A tibble: 4 x 5
id `2017` `2018` `2019` ALL
<chr> <chr> <chr> <chr> <chr>
1 aa tv NA NA tv
2 ss NA web NA web
3 dd NA NA book book
4 qq web NA tv web tv
Thanks for the help!
This actually is duplicate (or is really close) of this question but things have changed since then . unite
has na.rm
parameter which helps to drop NA
s.
As far as selection of columns is concerned, here we have selected all the columns ignoring the first one without specifying the column names so it should work for your case with multiple years.
library(tidyverse)
input %>%
unite("ALL", names(input)[-1], remove = FALSE, sep = " ", na.rm = TRUE)
# A tibble: 4 x 5
# id ALL `2017` `2018` `2019`
# <chr> <chr> <chr> <chr> <chr>
#1 aa tv tv NA NA
#2 ss web NA web NA
#3 dd book NA NA book
#4 qq web tv web NA tv
It worked for me after installing the development version of tidyr
by doing
devtools::install_github("tidyverse/tidyr")
Here is a base R
method
input$ALL <- apply(input[-1], 1, function(x) paste(na.omit(x), collapse=" "))
input$ALL
#[1] "tv" "web" "book" "web tv"
For the sake of completeness (and to supplement LocoGris' data.table
answer ), there are three other approaches which update input
by reference , ie, without copying the whole data object.
All approaches return the same result and can handle an arbitrary number of years.
Note that id
is supposed to be a unique key, ie, without any duplicates.
na.omit()
, aggregate library(data.table)
setDT(input)[, ALL := melt(input, id.var = "id")[, toString(na.omit(value)), by = id]$V1][]
id 2017 2018 2019 ALL 1: aa tv <NA> <NA> tv 2: ss <NA> web <NA> web 3: dd <NA> <NA> book book 4: qq web <NA> tv web, tv
BTW, reshaping from wide to long format exhibits a more concise way to store the sparsely populated data.
melt(input, id.var = "id", na.rm = TRUE)
id variable value 1: aa 2017 tv 2: qq 2017 web 3: ss 2018 web 4: dd 2019 book 5: qq 2019 tv
library(data.table)
setDT(input)[melt(input, id.var = "id", na.rm = TRUE)[, toString(value), by = id],
on = "id", ALL := V1][]
This drops the NA
values from the result of the reshape step which distorts the original row order due to the many NA
. Hence, an update join is required.
Filter()
, aggregate library(data.table)
setDT(input)[, ALL := .SD[, toString(Filter(Negate(is.na), .SD)), by = id]$V1][]
A data.table
approach:
library(data.table)
library(tidyverse)
input <- data.table(
id = c('aa', 'ss', 'dd', 'qq'),
'2017' = c('tv', NA, NA, 'web'),
'2018' = c(NA, 'web', NA, NA),
'2019' = c(NA, NA, 'book', 'tv')
)
""-> input[is.na(input)]
input[, ALL:=paste0(.SD,collapse=" "), .SDcols =2:length(input), by=seq_len(nrow(input))]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.