简体   繁体   中英

How to group rows in my data.frame?

I have a data.frame like this:

x <- data.frame(names=c('NG_1', 'NG_2', 'FG_1', 'FG_2'), score=c(1,2,3,4), label=c('N','N','F','F'))
x
  names score label
1  NG_1     1     N
2  NG_2     2     N
3  FG_1     3     F
4  FG_2     4     F

I want to group the two groups (N,F) by doing a substring match. For example, NG_1 matches with FG_1 . I am looking for my result something like this:

y <- data.frame(name1=c('NG_1','NG_2'), name2=c('FG_1', 'FG_2'),   score1=c(1,2), score2=c(3,4))
y
  name1 name2 score1 score2
1  NG_1  FG_1      1      3
2  NG_2  FG_2      2      4

The resulting table doesn't need to look exactly like above, but I do want the scores grouped.

The only way I can think of is to run a for-loop over all rows with the label= N and match each of them to F . Is there anything better?

We can do this with data.table . Convert the 'data.frame' to 'data.table' ( setDT(x) ), create a grouping variable ("Grp") and sequence ("N") based on the 'label', then use dcast (which can take multiple value.var columns) to convert the 'long' to 'wide' format.

library(data.table)
setDT(x)[, Grp:= .GRP, label]
x[, N:= 1:.N, label]
dcast(x, N~Grp, value.var=c('names', 'score'), sep='')[,N:= NULL][]
#     names1 names2 score1 score2
#1:   NG_1   FG_1      1      3
#2:   NG_2   FG_2      2      4

Here is a way using dplyr/tidyr

> require(dplyr)
> require(tidyr)
> x <- data.frame(names=c('NG_1', 'NG_2', 'FG_1', 'FG_2')
+     , score=c(1,2,3,4)
+     , label=c('N','N','F','F')
+     , stringsAsFactors = FALSE
+     )
> x
  names score label
1  NG_1     1     N
2  NG_2     2     N
3  FG_1     3     F
4  FG_2     4     F
> # create new 'label' for grouping
> x$label <- substring(x$names, 4, 4)  # extract grouping criteria
> x %>%
+     gather(key, value, -label) %>%  # wide to long using 'label'
+     group_by(label, key) %>%  # group for adding newkey
+     mutate(newkey = paste(key , seq(length(key)), sep = "_")) %>%
+     ungroup %>%  # remove grouping criteria
+     select(-key) %>%  # remove the 'key' column -- not needed
+     spread(newkey, value) %>%  # long to wide
+     select(-label)  # remove the 'label' column -- not needed
Source: local data frame [2 x 4]

  names_1 names_2 score_1 score_2
    (chr)   (chr)   (chr)   (chr)
1    NG_1    FG_1       1       3
2    NG_2    FG_2       2       4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM