I have data that looks like the following:
I want to convert this data to a binary matrix and attach it to the current data frame. What would be the most efficient way to do this?
I am very new to R so please bare with me if this is very basic question.
I tried extracting all the possible IDs (without brackets), and have it saved as a variable called allIDs
which is a character vector. I used this to add empty columns (filled with 'NA's) to the current data frame.
We could do it this way: First create a dummy IDs1
and remove all special . Then separate rows add value TRUE to all pivot wider and multiply afterwards all with one to get 0 and 1.
library(dplyr)
library(tidyr)
df %>%
mutate(IDs1 = IDs,
IDs1 = gsub("[[:punct:]]", "", IDs1)) %>%
separate_rows(IDs1) %>%
mutate(value = TRUE) %>%
pivot_wider(names_from = IDs1,values_from=value,values_fill=FALSE) %>%
mutate(across(-c(Filename, IDs), ~. *1))
output:
Filename IDs `86I` `114J` `35Y` `126K`
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 20220628_SD02_8908.JPG [86I] 1 0 0 0
2 20220628_SD02_8909.JPG [86I] 1 0 0 0
3 20220628_SD02_8910.JPG [86I] 1 0 0 0
4 20220628_SD02_8911.JPG [86I] 1 0 0 0
5 20220628_SD02_8912.JPG [86I] 1 0 0 0
6 20220628_SD02_8913.JPG [86I] 1 0 0 0
7 20220628_SD02_8914.JPG [86I] 1 0 0 0
8 20220628_SD02_8915.JPG [86I] 1 0 0 0
9 20220628_SD02_8916.JPG [114J] 0 1 0 0
10 20220628_SD02_8918.JPG [114J] 0 1 0 0
11 20220628_SD02_8919.JPG [35Y, 114J, 12~ 0 1 1 1
12 20220628_SD02_8922.JPG [35Y, 114J, 12~ 0 1 1 1
13 20220628_SD02_8923.JPG [35Y, 114J, 12~ 0 1 1 1
data:
structure(list(Filename = c("20220628_SD02_8908.JPG", "20220628_SD02_8909.JPG",
"20220628_SD02_8910.JPG", "20220628_SD02_8911.JPG", "20220628_SD02_8912.JPG",
"20220628_SD02_8913.JPG", "20220628_SD02_8914.JPG", "20220628_SD02_8915.JPG",
"20220628_SD02_8916.JPG", "20220628_SD02_8918.JPG", "20220628_SD02_8919.JPG",
"20220628_SD02_8922.JPG", "20220628_SD02_8923.JPG"), IDs = c("[86I]",
"[86I]", "[86I]", "[86I]", "[86I]", "[86I]", "[86I]", "[86I]",
"[114J]", "[114J]", "[35Y, 114J, 126K]", "[35Y, 114J, 126K]",
"[35Y, 114J, 126K]")), class = "data.frame", row.names = c("18",
"19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29",
"30"))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.