summarise to multiple columns using tidyr

Question

I have a dataframe with two columns.

Col A is a vector of references, and Col B is the corresponding vector of study sites in the reference.

My problem is that in one reference there could be multiple study sites, and one study site maybe found in multiple references too.

I want to make a summary of study sites, returning as many columns as there are that are linked to the study site.

Something like:

Original table
-------------
ref  | site
-------------
A    | S1
-------------
A    | S2
-------------
B    | S1
-------------

New table
site  | ref1 | ref2
-------------------
S1    | A    | B
-------------------
S2    | A    | NA
-------------------

spread doesn't work since there are duplicates of site .

Answer 1

Here's a way to get spread to work and generate the columns you want as well.

library(tidyverse)
original <- tibble(
  ref = c("A", "A", "B", "A"),
  site = c("S1", "S2", "S1", "S1")
)

original %>%
  distinct() %>%
  group_by(site) %>%
  mutate(refcount = str_c("ref", row_number())) %>%
  spread(refcount, ref)
#> # A tibble: 2 x 3
#> # Groups:   site [2]
#>   site  ref1  ref2 
#>   <chr> <chr> <chr>
#> 1 S1    A     B    
#> 2 S2    A     <NA>

Created on 2018-06-07 by the reprex package (v0.2.0).

summarise to multiple columns using tidyr

Question

1 answers

solution1
0 2018-06-07 17:51:14

summarise to multiple columns using tidyr

Question

1 answers

solution1 0 2018-06-07 17:51:14

solution1
0 2018-06-07 17:51:14