[英]Can I list the unique values for one column while grouping by another column in R?
I have the following columns:我有以下几列:
session condition codes
15 anxiety 1
15 depression 1
15 bipolar 1
15 high blood pressure 3
15 panic attacks 1
66 hypertension 5
66 high blood pressure 3
66 anxiety 1
66 panic attacks 1
75 schizophrenia 1
32 muscular dystrophy 4
32 anxiety 1
32 depression 1
32 panic attacks 1
I want to make a new column with just the unique codes per session and then leave the rest of the rows for that session blank.我想用每个会话的唯一代码创建一个新列,然后将该会话的其余行留空。 I know this logically doesn't make sense because this third column doesn't really match up with the first.
我知道这在逻辑上没有意义,因为第三列与第一列并不真正匹配。 If it needs to be in a new object or list or something that is fine.
如果它需要在一个新的对象或列表或其他东西中。
session condition codes unique_codes
15 anxiety 1 1
15 depression 1 3
15 bipolar 1
15 high blood pressure 3
15 panic attacks 1
66 hypertension 5 5
66 high blood pressure 3 3
66 anxiety 1 1
66 panic attacks 1
75 schizophrenia 1 1
32 muscular dystrophy 4 4
32 anxiety 1 1
32 depression 1
32 panic attacks 1
I have tried:我试过了:
conditions=conditions %>%
group_by(session)%>%
mutate(unique_codes=unique(conditions$codes))
However I get an error that says "must be length 5 (the group size) or one, not 4", which I assume is because I want the rest of the rows blank.但是,我收到一条错误消息,指出“长度必须为 5(组大小)或 1,而不是 4”,我认为这是因为我希望其余行为空白。 Does anyone know a way around this?
有谁知道解决这个问题的方法? Thank you!!
谢谢!!
The lengths are the issue, we can either paste it together or create a list column长度是问题,我们可以将其粘贴在一起或创建一个列表列
library(dplyr)
conditions %>%
group_by(session)%>%
mutate(unique_codes = toString(unique(codes)))
Or another option is to set the length
same by padding NA
at the end或者另一种选择是通过在末尾填充
NA
来设置相同的length
conditions %>%
group_by(session) %>%
mutate(unique_codes = `length<-`(unique(codes), n()))
# A tibble: 14 x 4
# Groups: session [4]
# session condition codes unique_codes
# <int> <chr> <int> <int>
# 1 15 anxiety 1 1
# 2 15 depression 1 3
# 3 15 bipolar 1 NA
# 4 15 high blood pressure 3 NA
# 5 15 panic attacks 1 NA
# 6 66 hypertension 5 5
# 7 66 high blood pressure 3 3
# 8 66 anxiety 1 1
# 9 66 panic attacks 1 NA
#10 75 schizophrenia 1 1
#11 32 muscular dystrophy 4 4
#12 32 anxiety 1 1
#13 32 depression 1 NA
#14 32 panic attacks 1 NA
The OP mentioned about n()
not working (could be a dplyr
version issue). OP 提到
n()
不起作用(可能是dplyr
版本问题)。 In that case, length
should work在这种情况下,
length
应该起作用
conditions %>%
group_by(session) %>%
mutate(unique_codes = `length<-`(unique(codes), length(codes)))
conditions <- structure(list(session = c(15L, 15L, 15L, 15L, 15L, 66L, 66L,
66L, 66L, 75L, 32L, 32L, 32L, 32L), condition = c("anxiety",
"depression", "bipolar", "high blood pressure", "panic attacks",
"hypertension", "high blood pressure", "anxiety", "panic attacks",
"schizophrenia", "muscular dystrophy", "anxiety", "depression",
"panic attacks"), codes = c(1L, 1L, 1L, 3L, 1L, 5L, 3L, 1L, 1L,
1L, 4L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-14L))
Another dplyr
option could be:另一个
dplyr
选项可能是:
df %>%
group_by(session) %>%
distinct(codes) %>%
transmute(unique_codes = codes,
rowid = 1:n()) %>%
right_join(df %>%
group_by(session) %>%
mutate(rowid = 1:n())) %>%
ungroup() %>%
select(-rowid)
session unique_codes condition codes
<int> <int> <chr> <int>
1 15 1 anxiety 1
2 15 3 depression 1
3 15 NA bipolar 1
4 15 NA high blood pressure 3
5 15 NA panic attacks 1
6 66 5 hypertension 5
7 66 3 high blood pressure 3
8 66 1 anxiety 1
9 66 NA panic attacks 1
10 75 1 schizophrenia 1
11 32 4 muscular dystrophy 4
12 32 1 anxiety 1
13 32 NA depression 1
14 32 NA panic attacks 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.