thanks in advance for any assistance.
I have a dataframe:
df <- structure(list(ID = c("0001", "0002", "0003", "0004"), May_1 = c(1,
2, 1, 3), May_5 = c(NA, 1, 2, 1), May_10 = c(NA, 3, 3, NA), May_16 = c(2,
NA, NA, NA), May_20 = c(3, NA, NA, 2)), row.names = c(NA, -4L
), class = c("tbl_df", "tbl", "data.frame"))
I would like to create new columns named "First Preference", "Second Preference" and "Third Preference" based on the row values for each response.
If a row value == 1, I would like to append a column called "First Preference" that contains the column name where the row value == 1.
My actual data contains about 40 dates that will be changing week over week, so a generalizable solution is most appreciated.
Here's the ideal df:
df_ideal <- structure(list(ID = c("0001", "0002", "0003", "0004"), May_1 = c(1,
2, 1, 3), May_5 = c(NA, 1, 2, 1), May_10 = c(NA, 3, 3, NA), May_16 = c(2,
NA, NA, NA), May_20 = c(3, NA, NA, 2), First_Preference = c("May_1",
"May_5", "May_1", "May_5"), Second_Preference = c("May_16", "May_1",
"May_5", "May_20"), Third_Preference = c("May_20", "May_10",
"May_10", "May_1")), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"))
A tidyverse
solution would be preferred, but I'm certainly open to anything.
Thanks!
In base R, we can use apply
row-wise order
the values removing NA
values and get corresponding column names.
cols <- paste(c('First', 'Second', 'Third'), "Preference", sep = "_")
df[cols] <- t(apply(df[-1], 1, function(x) names(df)[-1][order(x, na.last= NA)]))
df
# A tibble: 4 x 9
# ID May_1 May_5 May_10 May_16 May_20 First_Preference Second_Preference Third_Preference
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr>
#1 0001 1 NA NA 2 3 May_1 May_16 May_20
#2 0002 2 1 3 NA NA May_5 May_1 May_10
#3 0003 1 2 3 NA NA May_1 May_5 May_10
#4 0004 3 1 NA NA 2 May_5 May_20 May_1
We can reshape it to 'long' format, while dropping the NA
elements with values_drop_na
, then use the 'value' column as index to change the labels and then convert back to 'wide' format with pivot_wider
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = -ID, values_drop_na = TRUE) %>%
group_by(ID) %>%
mutate(value = c("First_Preference", "Second_Preference",
"Third_Preference")[value]) %>%
ungroup %>%
pivot_wider(names_from = value, values_from = name) %>%
left_join(df, .)
# A tibble: 4 x 9
# ID May_1 May_5 May_10 May_16 May_20 First_Preference Second_Preference Third_Preference
#* <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr>
#1 0001 1 NA NA 2 3 May_1 May_16 May_20
#2 0002 2 1 3 NA NA May_5 May_1 May_10
#3 0003 1 2 3 NA NA May_1 May_5 May_10
#4 0004 3 1 NA NA 2 May_5 May_20 May_1
To get the column names automatically, we can use ordinal
from english
library(english)
library(stringr)
df %>%
pivot_longer(cols = -ID, values_drop_na = TRUE) %>%
group_by(ID) %>%
mutate(value = str_c(ordinal(value), "_preference")) %>%
ungroup %>%
pivot_wider(names_from = value, values_from = name) %>%
left_join(df, .)
Or using data.table
library(data.table)
setDT(df)[dcast(melt(df, id.var = 'ID', na.rm = TRUE),
ID ~ paste0(ordinal(value), "_preference"), value.var = 'variable'), on = .(ID)]
# ID May_1 May_5 May_10 May_16 May_20 first_preference second_preference third_preference
#1: 0001 1 NA NA 2 3 May_1 May_16 May_20
#2: 0002 2 1 3 NA NA May_5 May_1 May_10
#3: 0003 1 2 3 NA NA May_1 May_5 May_10
#4: 0004 3 1 NA NA 2 May_5 May_20 May_1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.