简体   繁体   中英

grouping variable for ordered elements using adjecent correlation using R

I have "markr" variable which are arranged in order and correlation between subsequent members of "markr" is provided in corr variables.

markr <- c("A", "B", "C", "D", "E",  "g", "A1", "B1", "cc", "dd", 
     "f", "gg", "h", "K")
corr <- c(     1,   1,   1,   1, 0.96,   0.5,  0.96,        1 ,   1 ,  
       1 ,  0.85, 0.99, 1)

I need to group markr based on corr without changing the order of members of markr. The group can be better explained by following diagram:

在此处输入图片说明

The individual members of abject markr that have corr greater than 0.95 will be in one group. Starting from first value when the corr drops to less than 0.95, then second group starts and continues till the corr drops again below 0.95, the process continues to end of the data. The group variable are named by first and last members in the group for example - Ag, A1-f, gg-k.

Thus expected output is.

markr <- c("A", "B", "C", "D", "E",  "g", 
           "A1", "B1", "cc", "dd", "f", 
           "gg", "h", "K")
group <- c("A-g", "A-g", "A-g", "A-g","A-g", "A-g", 
           "A1-f",  "A1-f",  "A1-f",  "A1-f","A1-f", 
            "gg-k", "gg-k", "gg-k")
dataf <- data.frame (markr, group) 

dataf 

 markr group
1      A   A-g
2      B   A-g
3      C   A-g
4      D   A-g
5      E   A-g
6      g   A-g
7     A1  A1-f
8     B1  A1-f
9     cc  A1-f
10    dd  A1-f
11     f  A1-f
12    gg  gg-k
13     h  gg-k
14     K  gg-k

How can I automate this process as I have very large such dataset.

The number of the group is the number of values under 0.95 we have seen so far:

d1 <- data.frame(
  marker = markr,
  group = cumsum(c(1, corr < .95))
)

For the group names, you can use ddply the cut the data.frame into pieces, one per group: it is then easy to extract the first and last element.

library(plyr)
d2 <- ddply( 
  d1, "group", summarize, 
  group_name=paste(head(marker,1), tail(marker,1), sep="-")
)
d <- merge(d1, d2, by="group")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM