简体   繁体   English

如何在R中的数据框中跨多行组合特定数据

[英]How to combine specific data across multiple rows in a dataframe in R

I am looking to alter (concatenate, reshape I am not sure which word is right for this scenario) the data in my data frame by combining rows data cells across 1 column where the other columns in that row are identical.我希望通过组合跨 1 列的行数据单元格来改变(连接,重塑我不确定哪个词适合这种情况)我的数据框中的数据,其中该行中的其他列是相同的。

Basically, I have something like this:基本上,我有这样的事情:

    >df
    >Person_id     System_id    Category    Type    Tag
    >1A            134          1            Chr     Question
    >1A            134          1            Chr     Answer
    >1A            134          1            Chr     Evaluation
    >1A            134          1            Chr     Overall
    >1A            134          1            Chr     Analysis
    >Z4            002          1            Chr     Question
    >Z4            002          1            Chr     Answer

And get it to look something like this:让它看起来像这样:

    >Person_id     System_id    Category    Type    Tag
    >1A            134          1            Chr     Question, Answer, Evaluation, Overall, Analysis
    >Z4            002          1            Chr     Question, Answer

The Tags don't have to be separated by a comma , a space is fine.标签不必用逗号分隔,空格就可以了。 Any ideas where to look for a solution like this would be helpful.任何寻找此类解决方案的想法都会有所帮助。

Thank you.谢谢你。

We can group by the first four columns and paste the 'Tag' elements together我们可以按前四列分组paste “标签”元素paste在一起

library(dplyr)
df %>%
   group_by_at(1:4) %>%
   summarise(Tag = toString(Tag))
# A tibble: 2 x 5
# Groups:   Person_id, System_id, Category [2]
#  Person_id System_id Category Type  Tag                                            
#  <chr>         <int>    <int> <chr> <chr>                                          
#1 1A              134        1 Chr   Question, Answer, Evaluation, Overall, Analysis
#2 Z4                2        1 Chr   Question, Answer    

Or using base R或使用base R

aggregate(Tag ~ ., df, toString)

NOTE: toString is a convenient wrapper for paste(., collapse=", ")注意: toStringpaste(., collapse=", ")的方便包装器

data数据

df <- structure(list(Person_id = c("1A", "1A", "1A", "1A", "1A", "Z4", 
"Z4"), System_id = c(134L, 134L, 134L, 134L, 134L, 2L, 2L), Category = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L), Type = c("Chr", "Chr", "Chr", "Chr", 
"Chr", "Chr", "Chr"), Tag = c("Question", "Answer", "Evaluation", 
"Overall", "Analysis", "Question", "Answer")), 
 class = "data.frame", row.names = c(NA, 
-7L))

You can use paste0 with collapse = ", " to achieve this:您可以使用带有collapse = ", " paste0来实现这一点:

library(dplyr)
    df %>%
      group_by(Person_id, System_id, Category, Type) %>%
      summarise(Tag = paste0(Tag, collapse = ", "))

#Person_id System_id Category Type  Tag                                            
#  <chr>         <int>    <int> <chr> <chr>                                          
#1 1A              134        1 Chr   Question, Answer, Evaluation, Overall, Analysis
#2 Z4                2        1 Chr   Question, Answer

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM