I have data arranged like this in R:
indv time val
A 6 5
A 10 10
A 12 7
B 8 4
B 10 3
B 15 9
For each individual ( indv
) at each time, I want to calculate the change in value ( val
) from the initial time. So I would end up with something like this:
indv time val val_1 val_change
A 6 5 5 0
A 10 10 5 5
A 12 7 5 2
B 8 4 4 0
B 10 3 4 -1
B 15 9 4 5
Can anyone tell me how I might do this? I can use
ddply(df, .(indv), function(x)x[which.min(x$time), ])
to get a table like
indv time val
A 6 5
B 8 4
However, I cannot figure out how to make a column val_1
where the minimum values are matched up for each individual. However, if I can do that, I should be able to add column val_change
using something like:
df['val_change'] = df['val_1'] - df['val']
EDIT: two excellent methods were posted below, however both rely on my time column being sorted so that small time values are on top of high time values. I'm not sure this will always be the case with my data. (I know I can sort first in Excel, but I'm trying to avoid that.) How could I deal with a case when the table appears like this:
indv time value
A 10 10
A 6 5
A 12 7
B 8 4
B 10 3
B 15 9
Here is a data.table
solution that will be memory efficient as it is setting by reference within the data.table. Setting the key will sort by the key variables
library(data.table)
DT <- data.table(df)
# set key to sort by indv then time
setkey(DT, indv, time)
DT[, c('val1','change') := list(val[1], val - val[1]),by = indv]
# And to show it works....
DT
## indv time val val1 change
## 1: A 6 5 5 0
## 2: A 10 10 5 5
## 3: A 12 7 5 2
## 4: B 8 4 4 0
## 5: B 10 3 4 -1
## 6: B 15 9 4 5
Here's a plyr solution using ddply
ddply(df, .(indv), transform,
val_1 = val[1],
change = (val - val[1]))
indv time val val_1 change
1 A 6 5 5 0
2 A 10 10 5 5
3 A 12 7 5 2
4 B 8 4 4 0
5 B 10 3 4 -1
6 B 15 9 4 5
To get your second table try this:
ddply(df, .(indv), function(x) x[which.min(x$time), ])
indv time val
1 A 6 5
2 B 8 4
To deal with unsorted data, like the one you posted in your edit try the following
unsort <- read.table(text="indv time value
A 10 10
A 6 5
A 12 7
B 8 4
B 10 3
B 15 9", header=T)
do.call(rbind, lapply(split(unsort, unsort$indv),
function(x) x[order(x$time), ]))
indv time value
A.2 A 6 5
A.1 A 10 10
A.3 A 12 7
B.4 B 8 4
B.5 B 10 3
B.6 B 15 9
Now you can apply the procedure described above to this sorted dataframe
A shorter way to sort your dataframe is using sortBy
function from doBy package
library(doBy)
orderBy(~ indv + time, unsort)
indv time value
2 A 6 5
1 A 10 10
3 A 12 7
4 B 8 4
5 B 10 3
6 B 15 9
You can even sort your df using ddply
ddply(unsort, .(indv, time), sort)
value time indv
1 5 6 A
2 10 10 A
3 7 12 A
4 4 8 B
5 3 10 B
6 9 15 B
You can do this with the base functions. using your data
df <- read.table(text = "indv time val
A 6 5
A 10 10
A 12 7
B 8 4
B 10 3
B 15 9", header = TRUE)
We first split()
df
on the indv
variable
sdf <- split(df, df$indv)
Next we transform each component of sdf
adding in the val_1
and val_change
variables in a manner similar to how you suggest
sdf <- lapply(sdf, function(x) transform(x, val_1 = val[1],
val_change = val - val[1]))
Finally we arrange for the individual components to be bound row wise into a single data frame:
df <- do.call(rbind, sdf)
df
Which gives:
R> df
indv time val val_1 val_change
A.1 A 6 5 5 0
A.2 A 10 10 5 5
A.3 A 12 7 5 2
B.4 B 8 4 4 0
B.5 B 10 3 4 -1
B.6 B 15 9 4 5
To address the sorting issue the OP raises in the comments, modify the lapply()
call to include a sorting step prior to the transform()
. For example:
sdf <- lapply(sdf, function(x) {
x <- x[order(x$time), ]
transform(x, val_1 = val[1],
val_change = val - val[1])
})
In use we have
## scramble `df`
df <- df[sample(nrow(df)), ]
## split
sdf <- split(df, df$indv)
## apply sort and transform
sdf <- lapply(sdf, function(x) {
x <- x[order(x$time), ]
transform(x, val_1 = val[1],
val_change = val - val[1])
})
## combine
df <- do.call(rbind, sdf)
which again gives:
R> df
indv time val val_1 val_change
A.1 A 6 5 5 0
A.2 A 10 10 5 5
A.3 A 12 7 5 2
B.4 B 8 4 4 0
B.5 B 10 3 4 -1
B.6 B 15 9 4 5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.