简体   繁体   English

如何对data.frame中的行进行分组?

[英]How to group rows in my data.frame?

I have a data.frame like this: 我有一个像这样的data.frame:

x <- data.frame(names=c('NG_1', 'NG_2', 'FG_1', 'FG_2'), score=c(1,2,3,4), label=c('N','N','F','F'))
x
  names score label
1  NG_1     1     N
2  NG_2     2     N
3  FG_1     3     F
4  FG_2     4     F

I want to group the two groups (N,F) by doing a substring match. 我想通过子字符串匹配将两组(N,F)分组。 For example, NG_1 matches with FG_1 . 例如, NG_1用火柴FG_1 I am looking for my result something like this: 我正在寻找这样的结果:

y <- data.frame(name1=c('NG_1','NG_2'), name2=c('FG_1', 'FG_2'),   score1=c(1,2), score2=c(3,4))
y
  name1 name2 score1 score2
1  NG_1  FG_1      1      3
2  NG_2  FG_2      2      4

The resulting table doesn't need to look exactly like above, but I do want the scores grouped. 结果表不需要看起来完全像上面一样,但我确实希望将分数分组。

The only way I can think of is to run a for-loop over all rows with the label= N and match each of them to F . 我能想到的唯一方法是在所有带有label = N行上运行一个for循环,并将每行与F匹配。 Is there anything better? 有更好的吗?

We can do this with data.table . 我们可以使用data.table做到这data.table Convert the 'data.frame' to 'data.table' ( setDT(x) ), create a grouping variable ("Grp") and sequence ("N") based on the 'label', then use dcast (which can take multiple value.var columns) to convert the 'long' to 'wide' format. 将'data.frame'转换为'data.table'( setDT(x) ),基于'label'创建分组变量(“ Grp”)和序列(“ N”),然后使用dcast (可以多个value.var列)将“长”格式转换为“宽”格式。

library(data.table)
setDT(x)[, Grp:= .GRP, label]
x[, N:= 1:.N, label]
dcast(x, N~Grp, value.var=c('names', 'score'), sep='')[,N:= NULL][]
#     names1 names2 score1 score2
#1:   NG_1   FG_1      1      3
#2:   NG_2   FG_2      2      4

Here is a way using dplyr/tidyr 这是使用dplyr / tidyr的方法

> require(dplyr)
> require(tidyr)
> x <- data.frame(names=c('NG_1', 'NG_2', 'FG_1', 'FG_2')
+     , score=c(1,2,3,4)
+     , label=c('N','N','F','F')
+     , stringsAsFactors = FALSE
+     )
> x
  names score label
1  NG_1     1     N
2  NG_2     2     N
3  FG_1     3     F
4  FG_2     4     F
> # create new 'label' for grouping
> x$label <- substring(x$names, 4, 4)  # extract grouping criteria
> x %>%
+     gather(key, value, -label) %>%  # wide to long using 'label'
+     group_by(label, key) %>%  # group for adding newkey
+     mutate(newkey = paste(key , seq(length(key)), sep = "_")) %>%
+     ungroup %>%  # remove grouping criteria
+     select(-key) %>%  # remove the 'key' column -- not needed
+     spread(newkey, value) %>%  # long to wide
+     select(-label)  # remove the 'label' column -- not needed
Source: local data frame [2 x 4]

  names_1 names_2 score_1 score_2
    (chr)   (chr)   (chr)   (chr)
1    NG_1    FG_1       1       3
2    NG_2    FG_2       2       4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM