简体   繁体   中英

Recode categorical variables and reshape dataframe in R

I have a dataframe containing allele data for many individuals. There are several (>20) unique alleles for a gene listed in 2 columns. I would like to count the presence of 1 copy of an unique alleles as 1, 2 copies as 2 and the absence as 0. Also each unique allele count for each individual should be in separate columns.

My current df is

   ID   allele_1  allele_2 
   01   A01       A02
   02   A01       A07
   03   A08       A03
   04   A103      A01
   05   A09       A08
   06   A02       A02
   07   A08       A08 

I want my df to look like this;

   ID   A01  A02  A07  A08  A03 A103 A09  
   01   1    1    0    0    0   0    0 
   02   1    0    1    0    0   0    0
   03   0    0    0    1    1   0    0
   04   1    0    0    0    0   1    0
   05   0    0    0    1    0   0    1
   06   0    2    0    0    0   0    0
   07   0    0    0    2    0   0    0    

We can convert to 'long' format, get the 'count' and then reshape into 'wide'

library(dplyr)
library(tidyr)
df1 %>%
     pivot_longer(cols = -ID) %>%
     count(ID, value) %>% 
     pivot_wider(names_from = value, values_from = n, values_fill = list(n = 0))
# A tibble: 7 x 8
#     ID   A01   A02   A07   A03   A08  A103   A09
#  <int> <int> <int> <int> <int> <int> <int> <int>
#1     1     1     1     0     0     0     0     0
#2     2     1     0     1     0     0     0     0
#3     3     0     0     0     1     1     0     0
#4     4     1     0     0     0     0     1     0
#5     5     0     0     0     0     1     0     1
#6     6     0     2     0     0     0     0     0
#7     7     0     0     0     0     2     0     0

or with table from base R

table(rep(df1$ID, 2), unlist(df1[-1]))

Or with recast from reshape2

library(reshape2)
recast(df1, id.var = 'ID', ID ~ value, length)

data

df1 <- structure(list(ID = 1:7, allele_1 = c("A01", "A01", "A08", "A103", 
"A09", "A02", "A08"), allele_2 = c("A02", "A07", "A03", "A01", 
"A08", "A02", "A08")), class = "data.frame", row.names = c(NA, 
-7L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM