Reorganizing a dataframe in R by columns

Question

I'm new to coding and I don't really know what to google/search for because I can't find a fitting name for this operation. I'm sorry if I phrased the question poorly, I'm still unfamiliar with the correct terms. Regarding my problem: I have a dataset which is structured like following:

plant <-  c("A", "B", "C", "D")
employee <- c("Peter, Mark", "Mark", "Peter", "Steven")
df <- data.frame(plant, employee)

  plant    employee
1     A Peter, Mark
2     B        Mark
3     C       Peter
4     D      Steven

I now want to "reorganize" the dataframe by employee so it looks like so:

  employee plant
1    Peter  A, C
2     Mark  A, B
3    Maria     A
4   Steven     C

I'm really helpless as where to look for directions or a solution, I would appreciate any hint. Is this possible in base R?

Answer 1

We can use separate_rows to split the 'employee', column, then grouped by 'employee', paste the 'plant'

library(dplyr)
library(tidyr)
df %>% 
  separate_rows(employee) %>%
  group_by(employee) %>% 
  summarise(plant = toString(plant))

If we need to use base R , an option is to split the 'employee' column with strsplit into a list of vector s, set the names of the list with 'plant' column, convert the named list to a two column data.frame with stack and use aggregate to do a group by paste ( toString - paste(..., collapse=", ") )

aggregate(ind ~ values, stack(setNames(strsplit(as.character(df$employee),
            ",\\s*"), df$plant)), toString)

Answer 2

Using base R, we can split the employee 's on "," and repeat the plant values based on it. We can use tapply to combine plant values for each employee .

temp <- strsplit(df$employee, ",", fixed = TRUE)
stack(tapply(rep(df$plant, lengths(temp)), trimws(unlist(temp)), toString))


#  values    ind
#1   A, B   Mark
#2   A, C  Peter
#3      D Steven

Reorganizing a dataframe in R by columns

Question

2 answers

solution1
1 ACCPTED 2020-05-06 22:41:05

solution2
1 2020-05-07 02:34:02

Reorganizing a dataframe in R by columns

Question

2 answers

solution1 1 ACCPTED 2020-05-06 22:41:05

solution2 1 2020-05-07 02:34:02

solution1
1 ACCPTED 2020-05-06 22:41:05

solution2
1 2020-05-07 02:34:02