简体   繁体   English

R-根据一列中跨不同列的公共值,将data.frame格式化为另一个“组合” data.frame

[英]R- format a data.frame into another 'combined' data.frame based on common values within a column dependent across different columns

I'm starting with a data frame that consists of three columns. 我从一个包含三列的数据框架开始。 Column#1 contains ids that indicate 3 different time periods when the weight (column#3) of some persons (column#2) has been measured in kg. 第1列包含的ID表示3个不同的时间段,其中某些人(第2列)的体重(第3列)的重量(以kg为单位)。

All persons have been measured irregularly, which means, that some persons are measured multiple times or just once within a time period but not across all time periods. 对所有人员的测量都是不规则的,这意味着某些人员在一个时间段内被多次测量或仅被测量一次,但并非在所有时间段内都被测量。

   id       person_name person_weight
    1          Carol         51
    1          Mike          76
    1          Mike          81
    1          Dave          66
    1          Carol         59
    2          James         78
    2          Simone        55
    2          Simone        49
    2          David         85
    3          Mike          93
    3          Dave          110
    3          Dave          98 

Actually, the whole thing here is just a simplified example.. so dont bother if this kind of data collections makes no sense. 实际上,整个过程只是一个简化的示例..因此,如果这种数据收集没有任何意义,请不要打扰。

Now, I want to calculate the average (mean) weight for each person within a time period and then end up with a combined data frame that looks like the following one: 现在,我想计算一个时间段内每个人的平均(平均)体重,然后得出一个类似于以下内容的组合数据框:

group_id    Carol   Mike    Dave    James   Simone  David
   1         55     78.5     66      NA       NA     NA
   2         NA      NA      NA      78       52     85
   3         NA      93      104     NA       NA     NA

I tried some basic R functions (table, apply etc.) but couldn't deal with the dependence across the columns. 我尝试了一些基本的R函数(表,应用等),但无法处理各列之间的依赖关系。

Thanks in advance for any help that brings me closer to the second/'combined' dataframe. 在此先感谢您提供的帮助,使我更接近第二个“组合”数据框。

Seems like a simple dcast : 看起来像一个简单的dcast

library(reshape2)
dcast(dat,id ~person_name,
      fun.aggregate = mean,
      value.var = "person_weight",fill = NA_real_)
  id Carol Dave David James Mike Simone
1  1    55   66    NA    NA 78.5     NA
2  2    NA   NA    85    78   NA     52
3  3    NA  104    NA    NA 93.0     NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM