I have a dataframe with two columns.
Col A is a vector of references, and Col B is the corresponding vector of study sites in the reference.
My problem is that in one reference there could be multiple study sites, and one study site maybe found in multiple references too.
I want to make a summary of study sites, returning as many columns as there are that are linked to the study site.
Something like:
Original table
-------------
ref | site
-------------
A | S1
-------------
A | S2
-------------
B | S1
-------------
New table
site | ref1 | ref2
-------------------
S1 | A | B
-------------------
S2 | A | NA
-------------------
spread
doesn't work since there are duplicates of site
.
Here's a way to get spread
to work and generate the columns you want as well.
library(tidyverse)
original <- tibble(
ref = c("A", "A", "B", "A"),
site = c("S1", "S2", "S1", "S1")
)
original %>%
distinct() %>%
group_by(site) %>%
mutate(refcount = str_c("ref", row_number())) %>%
spread(refcount, ref)
#> # A tibble: 2 x 3
#> # Groups: site [2]
#> site ref1 ref2
#> <chr> <chr> <chr>
#> 1 S1 A B
#> 2 S2 A <NA>
Created on 2018-06-07 by the reprex package (v0.2.0).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.