简体   繁体   English

按列重组 R 中的 dataframe

[英]Reorganizing a dataframe in R by columns

I'm new to coding and I don't really know what to google/search for because I can't find a fitting name for this operation.我是编码新手,我真的不知道要谷歌/搜索什么,因为我找不到适合此操作的名称。 I'm sorry if I phrased the question poorly, I'm still unfamiliar with the correct terms.很抱歉,如果我的问题措辞不佳,我仍然不熟悉正确的术语。 Regarding my problem: I have a dataset which is structured like following:关于我的问题:我有一个结构如下的数据集:

plant <-  c("A", "B", "C", "D")
employee <- c("Peter, Mark", "Mark", "Peter", "Steven")
df <- data.frame(plant, employee)

  plant    employee
1     A Peter, Mark
2     B        Mark
3     C       Peter
4     D      Steven

I now want to "reorganize" the dataframe by employee so it looks like so:我现在想由员工“重组” dataframe,所以它看起来像这样:

  employee plant
1    Peter  A, C
2     Mark  A, B
3    Maria     A
4   Steven     C

I'm really helpless as where to look for directions or a solution, I would appreciate any hint.对于在哪里寻找方向或解决方案,我真的很无助,我将不胜感激。 Is this possible in base R?这在基础 R 中是否可行?

We can use separate_rows to split the 'employee', column, then grouped by 'employee', paste the 'plant'我们可以使用separate_rows来拆分'employee',列,然后按'employee'分组, paste 'plant'

library(dplyr)
library(tidyr)
df %>% 
  separate_rows(employee) %>%
  group_by(employee) %>% 
  summarise(plant = toString(plant))

If we need to use base R , an option is to split the 'employee' column with strsplit into a list of vector s, set the names of the list with 'plant' column, convert the named list to a two column data.frame with stack and use aggregate to do a group by paste ( toString - paste(..., collapse=", ") )如果我们需要使用base R ,一个选项是使用strsplit将 'employee' 列拆分为vector list ,使用 'plant' 列设置list的名称,将命名list转换为两列 data.frame使用stack并使用aggregate通过paste进行分组( toString - paste(..., collapse=", ")

aggregate(ind ~ values, stack(setNames(strsplit(as.character(df$employee),
            ",\\s*"), df$plant)), toString)

Using base R, we can split the employee 's on "," and repeat the plant values based on it.使用基础 R,我们可以拆分employee"," ,并根据它重复plant值。 We can use tapply to combine plant values for each employee .我们可以使用tapply来组合每个employeeplant值。

temp <- strsplit(df$employee, ",", fixed = TRUE)
stack(tapply(rep(df$plant, lengths(temp)), trimws(unlist(temp)), toString))


#  values    ind
#1   A, B   Mark
#2   A, C  Peter
#3      D Steven

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM