I have looked through other posts and I think I have an idea of what I could do, but I want to be clear!
I have a very large data frame that contains 4 variables and a number of rows.
Chain ResId ResNum Energy
1 C O17 500 -37.03670
2 A ARG 8 -0.84560
3 A LEU 24 -0.56739
4 A ASP 25 -0.98583
5 B ARG 8 -0.64880
6 B LEU 24 -0.58380
7 B ASP 25 -0.85930
Each row contains CHAIN (A, B, or C), ResID, ResNum, and Energy. I would like to sort this data so that all of the energy values belonging to a specific Resid and num in each chain are clustered together. By cluster I mean all of the values for "ARG 8" are grouped or all of the rows containing "ARG 8" are grouped. I don't know which is more efficient. Ideally, I would like the output for all residues to be
ARG 8
0.000
0.000
0.000
where the "0.000" are the energy values for ARG 8 or O17 and so on.
Sorry for the header breaks, I wanted the data to be clean, but I can't insert images.
data
structure(list(Chain = structure(c(3L, 1L, 1L, 1L, 2L, 2L, 2L
), .Label = c("A", "B", "C"), class = "factor"), ResId = structure(c(4L,
1L, 3L, 2L, 1L, 3L, 2L), .Label = c("ARG", "ASP", "LEU", "O17"
), class = "factor"), ResNum = c(500L, 8L, 24L, 25L, 8L, 24L,
25L), Energy = c(-37.0367, -0.8456, -0.56739, -0.98583, -0.6488,
-0.5838, -0.8593)), .Names = c("Chain", "ResId", "ResNum", "Energy"
), class = "data.frame", row.names = c(NA, -7L))
After your edit, the output you are most likely looking for is:
library(reshape2)
dcast(df, ResId~Chain, value.var= 'Energy')
ResId A B C
1 ARG -0.84560 -0.6488 NA
2 ASP -0.98583 -0.8593 NA
3 LEU -0.56739 -0.5838 NA
4 O17 NA NA -37.0367
This will put the values together. You can further specify based on your desired output.
df[order(df$ResId), ]
Chain ResId ResNum Energy
2 A ARG 8 -0.84560
5 B ARG 8 -0.64880
4 A ASP 25 -0.98583
7 B ASP 25 -0.85930
3 A LEU 24 -0.56739
6 B LEU 24 -0.58380
1 C O17 500 -37.03670
#With dplyr
library(dplyr)
df %>%
arrange(ResId)
Chain ResId ResNum Energy
1 A ARG 8 -0.84560
2 B ARG 8 -0.64880
3 A ASP 25 -0.98583
4 B ASP 25 -0.85930
5 A LEU 24 -0.56739
6 B LEU 24 -0.58380
7 C O17 500 -37.03670
df <- read.table(text = '
Chain ResId ResNum Energy
C O17 500 -37.0367
A ARG 8 -0.8456
A LEU 24 -0.56739
A ASP 25 -0.98583
B ARG 8 -0.6488
B LEU 24 -0.5838
B ASP 25 -0.8593', header=T)
If you want to convert to wide
format
library(reshape2)
dcast(df, ResId+ResNum~paste0('Energy.',Chain), value.var='Energy')
# ResId ResNum Energy.A Energy.B Energy.C
#1 ARG 8 -0.84560 -0.6488 NA
#2 ASP 25 -0.98583 -0.8593 NA
#3 LEU 24 -0.56739 -0.5838 NA
#4 O17 500 NA NA -37.0367
Try this:
df <- df[order(df$Chain, df$ResId, df$ResNum),]
where df is the name of your dataframe. This should order it for you.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.