简体   繁体   English

根据特定条件将行转换为列

[英]Transform rows to columns based on certain criteria

I have a dataset with attributes and their corresponding values as shown below 我有一个具有属性及其对应值的数据集,如下所示

 Obs#     Id     Class   Date            MedicationName        Dose        BloodTestResult
 1        1433   1       2007/01/01      Sitaglyptin           100mg       6.2
 2        1433   1       2007/03/24      Sitaglyptin           100mg       6.4
 3        1433   1       2007/06/15      Sitaglyptin           100mg       6.5
 4        1433   2       2007/09/25      Glucophage            10mg        6.7
 5        1433   2       2007/12/30      Glucophage            10mg        6.5
 6        1433   2       2008/02/01      Glucophage            10mg        6.6
 7        1433   3       2008/05/03      Glumetza              10mg        7.2
 8        1433   3       2008/08/10      Glumetza              10mg        6.4
 9        1433   3       2008/11/14      Glumetza              20mg        6.7
10        1433   3       2009/02/02      Glumetza              20mg        6.5
11        8348   3       2007/04/11      Glumetza              20mg        6.5
12        8348   3       2007/07/15      Glumetza              20mg        6.6

I like to transform this into a dataset like this 我喜欢将其转换成这样的数据集

 Obs#     Id     Class  Date1       MedicationName1       Dose1      Date2           MedicationName2       Dose2      Date3           MedicationName3       Dose3      BloodTestResult
 1        1433   1      2007/01/01  Sitaglyptin           100mg      2007/03/24      Sitaglyptin           100mg      2007/09/25      Glucophage            100mg        6.7
 2        1433   2      2007/09/25  Glucophage            10mg       2007/12/30      Glucophage            10mg       2008/02/01      Glucophage            10mg         7.2
 3        1433   3      2008/05/03  Glumetza              10mg       2008/08/10      Glumetza              10mg         -                 -                 -            6.7
 4        1433   3      2008/11/14  Glumetza              20mg       2009/02/02      Glumetza              20mg         -                 -                 -            6.5
 5        8348   3      2007/04/11  Glumetza              20mg       2007/07/15      Glumetza              20mg         -                 -                 -            6.6

The dataset above is transformed from rows to columns based on any of these criterias. 上面的数据集根据任何这些条件从行转换为列。

Scenario 1) Change in Medication (MedicantionName) or Change in Dosage(Dose) 方案1)更改药物(MedicantionName)或更改剂量(剂量)

    Observations 1,2,3 are same Medications (Sitaglyptin) and same dose (100mg). 
    So these three rows (1,2,3) are transformed into one row (row 1) as 
    shown in the tranformed dataset and
    The last column BloodTestResults will contain the value from 4th row (6.7).

    Similarly rows 4,5,6 because of Medication change(Glucophage). These 
    three rows 4,5,6  are transformed to a single row 2 as shown in the new  
    dataset and  
    The last column BloodTestResults will contain the value from 7th row (7.2).

    Similarly rows 7 and 8 because of Medication change (Glumetza). These 
    two rows 7,8  are transformed to a single row 3 as shown in the new 
    dataset and 
    The last column BloodTestResults will contain the value from 9th row (6.7).

Scenario 2) Change in Medication (MedicantionName) or Change in Dosage(Dose) 方案2)药物变更(MedicantionName)或剂量变更(剂量)

    Rows 9, 10 are transformed to a single row 4 as shown in the new dataset 
    because of dosage change(20mg) and 
    The last column BloodTestResults will contain the value from 10th row 
    (6.5) and not 11th row because this is the last   
    medication/dosage change for the id 1433

Scenario 3) Last medication on record for that patientId 方案3)该患者的ID记录在案的最后药物

    Rows 11,12 represent the only or last available information regarding
    id 8348. So they are just transformed to single row 5 as shown in the
    transformed dataset and
    The last column BloodTestResults will contain the value from 12th row 
    (6.6) because this is the last   
    medication/dosage change for the id 8348

I apologize if this is chaotic, hopeful I have explained the pattern in transforming this dataset with some clarity. 如果这很混乱,我深表歉意,希望我已经清楚地解释了转换此数据集的模式。 Appreciate any help in transforming this dataset based on these requirements. 感谢根据这些要求在转换此数据集方面的任何帮助。

Data 数据

df <- structure(list(Obs = 1:12, Id = c(1433L, 1433L, 1433L, 1433L, 
1433L, 1433L, 1433L, 1433L, 1433L, 1433L, 8348L, 8348L), Class = c(1L, 
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), Date = structure(c(1L, 
2L, 4L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 3L, 5L), .Label = c("2007/01/01", 
"2007/03/24", "2007/04/11", "2007/06/15", "2007/07/15", "2007/09/25", 
"2007/12/30", "2008/02/01", "2008/05/03", "2008/08/10", "2008/11/14", 
"2009/02/02"), class = "factor"), MedicationName = structure(c(3L, 
3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Glucophage", 
"Glumetza", "Sitaglyptin"), class = "factor"), Dose = structure(c(1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), .Label = c("100mg", 
"10mg", "20mg"), class = "factor"), BloodTestResult = c(6.2, 
6.4, 6.5, 6.7, 6.5, 6.6, 7.2, 6.4, 6.7, 6.5, 6.5, 6.6)), .Names = c("Obs", 
"Id", "Class", "Date", "MedicationName", "Dose", "BloodTestResult"
), class = "data.frame", row.names = c(NA, -12L))

This is kindof a tricky data transformation, especially the BloodTestResult since it requires data outside of the initial groupings of Id, Class (or MedicationName), and Dose. 这是一种棘手的数据转换,尤其是BloodTestResult,因为它需要的数据不在Id,Class(或MedicationName)和Dose的初始分组之内。 Breaking it into steps, you could try the following, (I've called the data dat ) 将其分解为几个步骤,您可以尝试以下操作(我将其称为data dat

## First split data: Id, Class and Dose
groups <- split(dat, interaction(dat$Id, dat$Class, dat$Dose, drop=T))

## Then, for each grouping, split by rows the columns you want to expand
tmp <- lapply(groups, function(x)
    cbind(x[1,1:3], do.call(cbind, split(x[,-c(1:3, ncol(x))], 1:nrow(x)))))

## Put back into data.frame
library(plyr)  # for rbind.fill, since some data.frames are missing columns
res <- do.call(rbind.fill, tmp)

## Finally, add the bloodtest
res$BloodTestResult <- unlist(sapply(split(dat, dat$Id), function(x)
    c(x$BloodTestResult[c(F, !(tail(x$Dose, -1) == head(x$Dose, -1) &
                                 tail(x$Class, -1) == head(x$Class, -1)))],
      tail(x$BloodTestResult, 1))))

#   Obs   Id Class     1.Date 1.MedicationName 1.Dose     2.Date 2.MedicationName
# 1   1 1433     1 2007/01/01      Sitaglyptin  100mg 2007/03/24      Sitaglyptin
# 2   4 1433     2 2007/09/25       Glucophage   10mg 2007/12/30       Glucophage
# 3   7 1433     3 2008/05/03         Glumetza   10mg 2008/08/10         Glumetza
# 4   9 1433     3 2008/11/14         Glumetza   20mg 2009/02/02         Glumetza
# 5  11 8348     3 2007/04/11         Glumetza   20mg 2007/07/15         Glumetza
#   2.Dose     3.Date 3.MedicationName 3.Dose BloodTestResult
# 1  100mg 2007/06/15      Sitaglyptin  100mg             6.7
# 2   10mg 2008/02/01       Glucophage   10mg             7.2
# 3   10mg       <NA>             <NA>   <NA>             6.7
# 4   20mg       <NA>             <NA>   <NA>             6.5
# 5   20mg       <NA>             <NA>   <NA>             6.6

The BloodTest column is calculated by first splitting the data by Id, then looking for changes in either Dose or Class, and extracting the BloodTestResult at those locations, then combining the last BloodTestValue for each Id. 通过首先按ID拆分数据,然后在Dose或Class中查找变化,然后在这些位置提取BloodTestResult,然后为每个ID合并最后的BloodTestValue,来计算BloodTest列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM