I'm new to coding and I don't really know what to google/search for because I can't find a fitting name for this operation. I'm sorry if I phrased the question poorly, I'm still unfamiliar with the correct terms. Regarding my problem: I have a dataset which is structured like following:
plant <- c("A", "B", "C", "D")
employee <- c("Peter, Mark", "Mark", "Peter", "Steven")
df <- data.frame(plant, employee)
plant employee
1 A Peter, Mark
2 B Mark
3 C Peter
4 D Steven
I now want to "reorganize" the dataframe by employee so it looks like so:
employee plant
1 Peter A, C
2 Mark A, B
3 Maria A
4 Steven C
I'm really helpless as where to look for directions or a solution, I would appreciate any hint. Is this possible in base R?
We can use separate_rows
to split the 'employee', column, then grouped by 'employee', paste
the 'plant'
library(dplyr)
library(tidyr)
df %>%
separate_rows(employee) %>%
group_by(employee) %>%
summarise(plant = toString(plant))
If we need to use base R
, an option is to split the 'employee' column with strsplit
into a list
of vector
s, set the names of the list
with 'plant' column, convert the named list
to a two column data.frame with stack
and use aggregate
to do a group by paste
( toString
- paste(..., collapse=", ")
)
aggregate(ind ~ values, stack(setNames(strsplit(as.character(df$employee),
",\\s*"), df$plant)), toString)
Using base R, we can split the employee
's on ","
and repeat the plant
values based on it. We can use tapply
to combine plant
values for each employee
.
temp <- strsplit(df$employee, ",", fixed = TRUE)
stack(tapply(rep(df$plant, lengths(temp)), trimws(unlist(temp)), toString))
# values ind
#1 A, B Mark
#2 A, C Peter
#3 D Steven
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.