简体   繁体   中英

R: Reshaping data as multiple columns into rows

I have a df which includes multiple columns, which you could find my templete below. I would like to reshape as columns into rows in R. I am sure it is possible with tidyr::gather() function but I can not manage it. If someone could help me I would be glad!

Best wishes

# Df I have
             A1 A2 A3 A4  B1 B2 B3 B4  C1 C2 C3  C4  D1 D2 D3 D4
X1 X2 X3 X4   a b  c  d   e  f  g  h    i  j  k  l
Y1 Y2 Y3 Y4   m n  o  p    
Z1 Z2 Z3 Z4   r s  t  u   w  v  y  z 


# Df I would like to reshape

            Col1 Col2 Col3 Col4
X1 X2 X3 X4   a   b    c   d
X1 X2 X3 X4   e   f    g   h
X1 X2 X3 X4   i   j    k   l
Y1 Y2 Y3 Y4   m   n    o   p
Z1 Z2 Z3 Z4   r   s    t   u
Z1 Z2 Z3 Z4   w   v    y   z

We could also do this with a single pivot_longer

library(dplyr)
library(tidyr)
library(stringr)
df %>% 
      pivot_longer(cols = -id,  names_to = c("grp", ".value"), 
            names_sep="(?<=[A-Z])(?=[0-9])", values_drop_na = TRUE) %>% 
      select(-grp) %>%
      rename_at(-1, ~ str_c('Col', .))
# A tibble: 7 x 5
#     id Col1  Col2  Col3  Col4 
#  <int> <chr> <chr> <chr> <chr>
#1     1 a     b     c     d    
#2     1 e     f     g     h    
#3     1 i     j     k     l    
#4     2 m     n     o     p    
#5     2 q     <NA>  <NA>  <NA> 
#6     3 r     s     t     u    
#7     3 w     v     y     z    

data

df <- structure(list(id = 1:3, A1 = c("a", "m", "r"), A2 = c("b", "n", 
"s"), A3 = c("c", "o", "t"), A4 = c("d", "p", "u"), B1 = c("e", 
"q", "w"), B2 = c("f", NA, "v"), B3 = c("g", NA, "y"), B4 = c("h", 
NA, "z"), C1 = c("i", NA, NA), C2 = c("j", NA, NA), C3 = c("k", 
NA, NA), C4 = c("l", NA, NA), D1 = c(NA, NA, NA), D2 = c(NA, 
NA, NA), D3 = c(NA, NA, NA), D4 = c(NA, NA, NA)), class = "data.frame",
row.names = c("1", 
"2", "3"))

I bet there are more elegant solutions, but this one uses tidyr and dplyr :

Suppose your data looks like

> df
# A tibble: 3 x 17
     id A1    A2    A3    A4    B1    B2    B3    B4    C1    C2    C3    C4    D1    D2    D3    D4   
  <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1     1 a     b     c     d     e     f     g     h     i     j     k     l     NA    NA    NA    NA   
2     2 m     n     o     p     q     NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   
3     3 r     s     t     u     w     v     y     z     NA    NA    NA    NA    NA    NA    NA    NA

I replaced your X1 X2 X3 X4, ... by an indexing column and I added on q in column B1 .

Using

df %>%
  pivot_longer(cols=matches("\\d$"), 
               names_to = c("set"),
               names_pattern = ".(.)") %>%
  pivot_wider(names_from="set", 
              names_prefix="Col",
              values_fn = list) %>%
  unnest(matches("\\d$")) %>%
  rowwise() %>%
  filter(sum(is.na(c_across(matches("\\d$")))) != ncol(.) - 1)  # -1 because of the indexing column

returns

# A tibble: 7 x 5
# Rowwise: 
     id Col1  Col2  Col3  Col4 
  <dbl> <chr> <chr> <chr> <chr>
1     1 a     b     c     d    
2     1 e     f     g     h    
3     1 i     j     k     l    
4     2 m     n     o     p    
5     2 q     NA    NA    NA   
6     3 r     s     t     u    
7     3 w     v     y     z 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM