[英]Distinguishing the levels of a factor variable in R
假設我的數據集包含三列:id(標識),case(字符)和value(numeric)。 這是我的數據集:
tdata <- data.frame(id=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4), case=c("a","b","c","c","a","b","c","c","a","b","c","c","a","b","c","c"), value=c(1,34,56,23,546,34,67,23,65,23,65,23,87,34,321,56))
tdata
id case value
1 1 a 1
2 1 b 34
3 1 c 56
4 1 c 23
5 2 a 546
6 2 b 34
7 2 c 67
8 2 c 23
9 3 a 65
10 3 b 23
11 3 c 65
12 3 c 23
13 4 a 87
14 4 b 34
15 4 c 321
16 4 c 56
如果您注意到,對於每個ID,我們有兩個c。 如何將它們重命名為c1和c2? (我需要區分它們以便進一步分析)。
怎么樣:
within(tdata, case <- ave(as.character(case), id, FUN=make.unique))
我建議您只需添加輔助“ID”列,而不是替換“case”列中的值。 這可以通過我的“splitstackshape”包中的getanID
輕松完成。
library(splitstackshape)
getanID(tdata, c("id", "case"))[]
# id case value .id
# 1: 1 a 1 1
# 2: 1 b 34 1
# 3: 1 c 56 1
# 4: 1 c 23 2
# 5: 2 a 546 1
# 6: 2 b 34 1
# 7: 2 c 67 1
# 8: 2 c 23 2
# 9: 3 a 65 1
# 10: 3 b 23 1
# 11: 3 c 65 1
# 12: 3 c 23 2
# 13: 4 a 87 1
# 14: 4 b 34 1
# 15: 4 c 321 1
# 16: 4 c 56 2
根據您安裝的“data.table”版本,可能需要也可能不需要[]
。
如果你真的想要折疊那些列,你也可以這樣做:
getanID(tdata, c("id", "case"))[, case := paste0(case, .id)][, .id := NULL][]
# id case value
# 1: 1 a1 1
# 2: 1 b1 34
# 3: 1 c1 56
# 4: 1 c2 23
# 5: 2 a1 546
# 6: 2 b1 34
# 7: 2 c1 67
# 8: 2 c2 23
# 9: 3 a1 65
# 10: 3 b1 23
# 11: 3 c1 65
# 12: 3 c2 23
# 13: 4 a1 87
# 14: 4 b1 34
# 15: 4 c1 321
# 16: 4 c2 56
這個稍微修改過的方法怎么樣:
library(dplyr)
tdata %>% group_by(id, case) %>% mutate(caseNo = paste0(case, row_number())) %>%
ungroup() %>% select(-case)
#Source: local data frame [16 x 3]
#
# id value caseNo
#1 1 1 a1
#2 1 34 b1
#3 1 56 c1
#4 1 23 c2
#5 2 546 a1
#6 2 34 b1
#7 2 67 c1
#8 2 23 c2
#9 3 65 a1
#10 3 23 b1
#11 3 65 c1
#12 3 23 c2
#13 4 87 a1
#14 4 34 b1
#15 4 321 c1
#16 4 56 c2
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.