Given the following dataframe:
zed = data.frame(Title = c('a', NA, NA, NA, NA, NA), TITLE = c(NA, 'b', NA, NA, NA, NA), "Title/Description" = c(NA, NA, 'c', 'd', NA, NA), title = c(NA, NA, NA, NA, 'e', 'f'))
We'd like to consolidate everything down into one Title
column. We can assume that in each row, there will only be 1 value that is not NA, and we are hunting for that value to add into the Title
column. We can consolidate this down via the following approach:
yah <- zed %>%
dplyr::mutate(Title = ifelse(!is.na(Title), Title,
ifelse(!is.na(`Title.Description`), `Title.Description`,
ifelse(!is.na(TITLE), TITLE,
ifelse(!is.na(title), title, ""))))) %>%
dplyr::select(-Title.Description, -TITLE, -title)
however this approach is messy, and becomes even messier if the dataframe has 10+ title columns. this approach also breaks if a column is not present in the data:
# there's no TITLE2 --> ERROR
yah <- zed %>%
dplyr::mutate(Title = ifelse(!is.na(Title), Title,
ifelse(!is.na(`Title.Description`), `Title.Description`,
ifelse(!is.na(TITLE2), TITLE2,
ifelse(!is.na(title), title, ""))))) %>%
dplyr::select(-Title.Description, -TITLE2, -title)
any recommendations on how to (a) eliminate the nasty ifelse() nesting and (b) not throw an error if one of the columns is omitted.
dplyr
has the coalesce
function that takes the first non-missing value:
yah <- zed %>%
mutate(Title = coalesce(
Title, Title.Description, TITLE2, title, ""
)
)
This will still break if one of the columns doesn't exist. I'd recommend identifying any missing columns and setting them to NA
before running this.
Another option is to select only the columns that contain the the word title
(regardless of case), then use coalesce
. I also give the new column a different name, so that I could remove all the columns that contain title
, then rename it.
library(tidyverse)
yah <- zed %>%
mutate(temp = coalesce(!!!select(., contains("title")))) %>%
select(-contains("title")) %>%
rename(Title = temp)
Output
yah
Title
1 a
2 b
3 c
4 d
5 e
6 f
Another option is to use max
with c_across
:
zed %>%
rowwise() %>%
mutate(temp = max(c_across(contains("title")), na.rm = TRUE)) %>%
select(-contains("title")) %>%
rename(Title = temp)
Or another option is to use reduce
with pmax
zed %>%
mutate(temp = reduce(select(., contains("title")), pmax, na.rm = TRUE)) %>%
select(-contains("title")) %>%
rename(Title = temp)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.