简体   繁体   中英

Dataframe, split by column values and put into new columns

I am new to R, and currently working with setting up my data. My data comes in a format where I each row contains a single measurement (DV), and a column with an explanation for the type of measurement (DVID).

Here is an example of my data:

ID  TIME    DV  DVID
1   0   0.0 7
1   1   27.5    1
1   1   0.0 7
1   4   19.6    1
1   4   0.0 7
1   8   17.9    1
1   8   0.0 7
1   12  17.7    1
1   12  0.0 7
1   24  19.6    1
1   24  0.0 7
1   48  32.9    1
1   48  0.0 7
2   0   0.0 7
2   1   0.0 7
2   4   0.0 7
2   8   0.0 7
2   12  0.0 7
2   24  0.0 7
2   48  27.3    1
2   72  30.9    1
2   72  0.0 7
2   96  20.8    1
3   0   1.0 7
3   1   7.0 1
3   1   0.0 7
3   4   15.0    1
3   4   0.0 7
3   8   27.2    1
3   8   0.0 7
3   12  0.0 7
3   24  47.0    1
3   24  0.0 7
3   48  65.4    1
3   48  0.0 7
3   72  68.7    1
3   72  0.0 7
3   96  82.8    1
3   96  0.0 7
3   120 70.5    1

What I want to do is to "pair together" the different types of measurements, so I have one column with the measurements that is one type (DVID=1) and another column with the measurements that is another type (DVID=7). I also need to delete the measurements where I don't have both type of measurements (or, alternatively, put in NA in these fields) An example of this looks like:

ID  TIME    DV_1    DV_7
1   1   27.5    0
1   4   19.6    0
1   8   17.9    0
1   12  17.7    0
1   24  19.6    0
1   48  32.9    0

The purpose is that I want to be able to plot the DVID = 1 values against the DVID = 7 values. Can anyone here help me with doing this? I now that i probably have to use functions in the split and apply family, but I have no idea about where to start.

Thanks in advance!

Here is one approach.

library(dplyr)
library(tidyr)

#Create one column for group1 and another for group7 in DVID
ana <- spread(foo, DVID, DV)

colnames(ana) <- c("ID", "TIME", "DV1", "DV7")

# Remove rows which have NA
filter(ana, !DV1 %in% NA & !DV7 %in% NA)

#   ID TIME  DV1 DV7
#1   1    1 27.5   0
#2   1    4 19.6   0
#3   1    8 17.9   0
#4   1   12 17.7   0
#5   1   24 19.6   0
#6   1   48 32.9   0
#7   2   72 30.9   0
#8   3    1  7.0   0
#9   3    4 15.0   0
#10  3    8 27.2   0
#11  3   24 47.0   0
#12  3   48 65.4   0
#13  3   72 68.7   0
#14  3   96 82.8   0

Another way could be this given you convert your data frame to data.table

setDT(foo)

bob <- dcast.data.table(foo, ID + TIME ~ DVID, value.var = "DV")

setnames(bob, c("1","7"), c("DV1", "DV7"))[!DV1 %in% NA & !DV7 %in% NA, ]

Update

Given @Arun's advice, the 3rd line can be like this using data.table 1.9.5

na.omit(bob, by=c("1", "7"))

You appear to be wanting to reshape your data. Use cast from the reshape package.

library(reshape)

# read data
dfX = read.table(textConnection("ID  TIME   DV  DVID
1   0   0.0 7
1   1   27.5    1
               1    1   0.0 7
               1    4   19.6    1
               1    4   0.0 7
               1    8   17.9    1
               1    8   0.0 7
               1    12  17.7    1
               1    12  0.0 7
               1    24  19.6    1
               1    24  0.0 7
               1    48  32.9    1
               1    48  0.0 7
               2    0   0.0 7
               2    1   0.0 7
               2    4   0.0 7
               2    8   0.0 7
               2    12  0.0 7
               2    24  0.0 7
               2    48  27.3    1
               2    72  30.9    1
               2    72  0.0 7
               2    96  20.8    1
               3    0   1.0 7
               3    1   7.0 1
               3    1   0.0 7
               3    4   15.0    1
               3    4   0.0 7
               3    8   27.2    1
               3    8   0.0 7
               3    12  0.0 7
               3    24  47.0    1
               3    24  0.0 7
               3    48  65.4    1
               3    48  0.0 7
               3    72  68.7    1
               3    72  0.0 7
               3    96  82.8    1
               3    96  0.0 7
               3    120 70.5    1"), header = TRUE)

# reshape the data
reshape::cast(dfX, ID + TIME ~ DVID, value = "DV")

Here is the output:

> reshape::cast(dfX, ID + TIME ~ DVID, value = "DV")
   ID TIME    1  7
1   1    0   NA  0
2   1    1 27.5  0
3   1    4 19.6  0
4   1    8 17.9  0
5   1   12 17.7  0
6   1   24 19.6  0
7   1   48 32.9  0
8   2    0   NA  0
9   2    1   NA  0
10  2    4   NA  0
11  2    8   NA  0
12  2   12   NA  0
13  2   24   NA  0
14  2   48 27.3 NA
15  2   72 30.9  0
16  2   96 20.8 NA
17  3    0   NA  1
18  3    1  7.0  0
19  3    4 15.0  0
20  3    8 27.2  0
21  3   12   NA  0
22  3   24 47.0  0
23  3   48 65.4  0
24  3   72 68.7  0
25  3   96 82.8  0
26  3  120 70.5 NA

In addition, you could use reshape from base R

 na.omit(reshape(df, idvar = c("ID","TIME"),
               timevar="DVID", direction = "wide"))[,c(1:2,4:3)]

 #    ID TIME DV.1 DV.7
 #2   1    1 27.5    0
 #4   1    4 19.6    0
 #6   1    8 17.9    0
 #8   1   12 17.7    0
 #10  1   24 19.6    0
 #12  1   48 32.9    0
 #21  2   72 30.9    0
 #25  3    1  7.0    0
 #27  3    4 15.0    0
 #29  3    8 27.2    0
 #32  3   24 47.0    0
 #34  3   48 65.4    0
 #36  3   72 68.7    0
 #38  3   96 82.8    0

data

 df <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 
 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), TIME = c(0L, 
 1L, 1L, 4L, 4L, 8L, 8L, 12L, 12L, 24L, 24L, 48L, 48L, 0L, 1L, 
 4L, 8L, 12L, 24L, 48L, 72L, 72L, 96L, 0L, 1L, 1L, 4L, 4L, 8L, 
 8L, 12L, 24L, 24L, 48L, 48L, 72L, 72L, 96L, 96L, 120L), DV = c(0, 
 27.5, 0, 19.6, 0, 17.9, 0, 17.7, 0, 19.6, 0, 32.9, 0, 0, 0, 0, 
 0, 0, 0, 27.3, 30.9, 0, 20.8, 1, 7, 0, 15, 0, 27.2, 0, 0, 47, 
 0, 65.4, 0, 68.7, 0, 82.8, 0, 70.5), DVID = c(7L, 1L, 7L, 1L, 
 7L, 1L, 7L, 1L, 7L, 1L, 7L, 1L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 1L, 
 1L, 7L, 1L, 7L, 1L, 7L, 1L, 7L, 1L, 7L, 7L, 1L, 7L, 1L, 7L, 1L, 
7L, 1L, 7L, 1L)), .Names = c("ID", "TIME", "DV", "DVID"), class = "data.frame", row.names = c(NA, 
-40L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM