简体   繁体   中英

Convert data frame info (a column that's a character list) into binary matrix

I have data that looks like the following: 在此处输入图像描述

I want to convert this data to a binary matrix and attach it to the current data frame. What would be the most efficient way to do this?

I am very new to R so please bare with me if this is very basic question.

I tried extracting all the possible IDs (without brackets), and have it saved as a variable called allIDs which is a character vector. I used this to add empty columns (filled with 'NA's) to the current data frame.

在此处输入图像描述

We could do it this way: First create a dummy IDs1 and remove all special . Then separate rows add value TRUE to all pivot wider and multiply afterwards all with one to get 0 and 1.

library(dplyr)
library(tidyr)

df %>% 
  mutate(IDs1 = IDs,
         IDs1 = gsub("[[:punct:]]", "", IDs1)) %>% 
  separate_rows(IDs1) %>% 
  mutate(value = TRUE) %>% 
  pivot_wider(names_from = IDs1,values_from=value,values_fill=FALSE) %>%  
  mutate(across(-c(Filename, IDs), ~. *1))

output:

     Filename               IDs             `86I` `114J` `35Y` `126K`
   <chr>                  <chr>           <dbl>  <dbl> <dbl>  <dbl>
 1 20220628_SD02_8908.JPG [86I]               1      0     0      0
 2 20220628_SD02_8909.JPG [86I]               1      0     0      0
 3 20220628_SD02_8910.JPG [86I]               1      0     0      0
 4 20220628_SD02_8911.JPG [86I]               1      0     0      0
 5 20220628_SD02_8912.JPG [86I]               1      0     0      0
 6 20220628_SD02_8913.JPG [86I]               1      0     0      0
 7 20220628_SD02_8914.JPG [86I]               1      0     0      0
 8 20220628_SD02_8915.JPG [86I]               1      0     0      0
 9 20220628_SD02_8916.JPG [114J]              0      1     0      0
10 20220628_SD02_8918.JPG [114J]              0      1     0      0
11 20220628_SD02_8919.JPG [35Y, 114J, 12~     0      1     1      1
12 20220628_SD02_8922.JPG [35Y, 114J, 12~     0      1     1      1
13 20220628_SD02_8923.JPG [35Y, 114J, 12~     0      1     1      1

data:

structure(list(Filename = c("20220628_SD02_8908.JPG", "20220628_SD02_8909.JPG", 
"20220628_SD02_8910.JPG", "20220628_SD02_8911.JPG", "20220628_SD02_8912.JPG", 
"20220628_SD02_8913.JPG", "20220628_SD02_8914.JPG", "20220628_SD02_8915.JPG", 
"20220628_SD02_8916.JPG", "20220628_SD02_8918.JPG", "20220628_SD02_8919.JPG", 
"20220628_SD02_8922.JPG", "20220628_SD02_8923.JPG"), IDs = c("[86I]", 
"[86I]", "[86I]", "[86I]", "[86I]", "[86I]", "[86I]", "[86I]", 
"[114J]", "[114J]", "[35Y, 114J, 126K]", "[35Y, 114J, 126K]", 
"[35Y, 114J, 126K]")), class = "data.frame", row.names = c("18", 
"19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", 
"30"))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM