根据条件对data.frame进行分区

Question

I have a data.frame shaped like: 我有一个data.frame形状像：

c <- data.frame(name=c("a", "a", "b", "b", "c", "c","d","d"), value=c(1,3,2,4,5,3,4,5), address=c("rrrr","rrrr","zzzz","aaaa","ssss","jjjj","qqqq","qqqq"))
> c
  name value address
1    a     1    rrrr
2    a     3    rrrr
3    b     2    zzzz 
4    b     4    aaaa
5    c     5    ssss
6    c     3    jjjj
7    d     4    qqqq
8    d     5    qqqq

I am trying to split this data frame into two separate data frames according to one simple rule: group together people who didn't change address and group together people that changed address. 我试图根据一个简单的规则将这个数据框分成两个独立的数据框：将没有更改地址的人组合在一起，并将更改地址的人组合在一起。 Any hint on how to accomplish the task? 有关如何完成任务的任何提示？

So far I am playing, with no avail, with: 到目前为止，我正在玩，但没有用，有：

for(i in seq(1,8, by=2)){
    print(i)
    print(unlist(c[which(c[i,3]==c[(i+1),3]),]))    
}

Answer 1

This counts the number of addresses and splits on that basis. 这会在此基础上计算地址和拆分的数量。 There is a hurdle to get over and it related to always getting <NA> from ave until using as.character . 有一个障碍要克服，它总是从ave获得<NA>直到使用as.character 。 There was a warning message from which I'm copying the beginning, so searchers might be able to find this: 有一条警告信息，我正在复制它的开头，所以搜索者可能会找到这个：

Warning messages:
1: In `[<-.factor`(`*tmp*`, i, value = c(1L, 1L)) :

The successful version (using a data-object named cc ): 成功的版本（使用名为cc的数据对象）：

 split(cc,  ave(as.character(cc$address), cc$name, FUN=function(x) sum(!duplicated(x)) ) )

$`1`
  name value address
1    a     1    rrrr
2    a     3    rrrr
7    d     4    qqqq
8    d     5    qqqq

$`2`
  name value address
3    b     2    zzzz
4    b     4    aaaa
5    c     5    ssss
6    c     3    jjjj

If you really wanted a bipartite split then convert to logical with > 1 : 如果你真的想要一个双分裂，那么转换为逻辑> 1 ：

 split(cc, ave(as.character(cc$address), cc$name, FUN=function(x) sum(!duplicated(x)) ) >1)

$`FALSE`
  name value address
1    a     1    rrrr
2    a     3    rrrr
7    d     4    qqqq
8    d     5    qqqq

$`TRUE`
  name value address
3    b     2    zzzz
4    b     4    aaaa
5    c     5    ssss
6    c     3    jjjj

I don't understand the comment. 我不明白这个评论。 This is what I get as str(dat) : 这就是我得到的str(dat) ：

List of 2
 $ FALSE:'data.frame':  4 obs. of  3 variables:
  ..$ name   : Factor w/ 4 levels "a","b","c","d": 1 1 4 4
  ..$ value  : num [1:4] 1 3 4 5
  ..$ address: Factor w/ 6 levels "aaaa","jjjj",..: 4 4 3 3
 $ TRUE :'data.frame':  4 obs. of  3 variables:
  ..$ name   : Factor w/ 4 levels "a","b","c","d": 2 2 3 3
  ..$ value  : num [1:4] 2 4 5 3
  ..$ address: Factor w/ 6 levels "aaaa","jjjj",..: 6 1 5 2

Answer 2

using dplyr : 使用dplyr ：

library(dplyr)
z<-c %>% group_by(name) %>% 
         mutate(changed = n_distinct(address))
split(z, z$changed)

Thanks to @akrun for reminding me of n_distinct 感谢@akrun提醒我n_distinct

Answer 3

@jeremycg's answer is great and I am trying to learn dplyr, but here is the non-dplyr version as well. @ jeremycg的答案很棒，我正在尝试学习dplyr，但这里也是非dplyr版本。

numAddresses <- sapply(split(c, c$name), function(x)
    length(unique(x$address)))
split(c, numAddresses[c$address])

根据条件对data.frame进行分区

问题描述

3 个解决方案

解决方案1
2 已采纳 2015-06-23 17:22:37

解决方案2
1 2015-06-23 17:07:41

解决方案3
0 2015-06-23 17:11:47

根据条件对data.frame进行分区

问题描述

3 个解决方案

解决方案1 2 已采纳 2015-06-23 17:22:37

解决方案2 1 2015-06-23 17:07:41

解决方案3 0 2015-06-23 17:11:47

解决方案1
2 已采纳 2015-06-23 17:22:37

解决方案2
1 2015-06-23 17:07:41

解决方案3
0 2015-06-23 17:11:47