This is a basic question, but I'm stumped:
I have the following R data.table:
library(data.table)
DT <- fread('unique_point biased data_points team groupID
up1 FALSE 3 A xy28352
up1 TRUE 4 A xy28352
up2 FALSE 1 A xy28352
up2 TRUE 0 X xy28352
up3 FALSE 12 Y xy28352
up3 TRUE 35 Z xy28352')
which prints out as
> DT
unique_point biased data_points team groupID
1: up1 FALSE 3 A xy28352
2: up1 TRUE 4 A xy28352
3: up2 FALSE 1 A xy28352
4: up2 TRUE 0 X xy28352
5: up3 FALSE 12 Y xy28352
6: up3 TRUE 35 Z xy28352
The values for the column team
are letters A to Z, 26 possibilities. At the moment. If I count the row values with this code:
DT[, counts := .N, by=c("team")]
which gives
> DT
unique_point biased data_points team groupID counts
1: up1 FALSE 3 A xy28352 3
2: up1 TRUE 4 A xy28352 3
3: up2 FALSE 1 A xy28352 3
4: up2 TRUE 0 X xy28352 1
5: up3 FALSE 12 Y xy28352 1
6: up3 TRUE 35 Z xy28352 1
I would like to create 26 new columns in DT
which gives the size of each team
, A
, B
, C
, etc.
The resulting data.table would look like:
> DT
unique_point biased data_points team groupID A B C ... Z
1: up1 FALSE 3 A xy28352 3 0 0 ... 1
2: up1 TRUE 4 A xy28352 3 0 0 ... 1
3: up2 FALSE 1 A xy28352 3 0 0 ... 1
4: up2 TRUE 0 X xy28352 3 0 0 ... 1
5: up3 FALSE 12 Y xy28352 3 0 0 ... 1
6: up3 TRUE 35 Z xy28352 3 0 0 ... 1
I'm not sure how one does this with data.table
syntax..
EDIT: I'm happy to do this with base R and dplyr as well.
What about plyr
, is that ok?
library(data.table)
library(plyr)
DT <- fread('unique_point biased data_points team groupID
up1 FALSE 3 A xy28352
up1 TRUE 4 A xy28352
up2 FALSE 1 A xy28352
up2 TRUE 0 X xy28352
up3 FALSE 12 Y xy28352
up3 TRUE 35 Z xy28352')
ldply(LETTERS, function(x){
n <- nrow(DT[team == as.character(x),])
DT[, as.character(x) := n]
return(DT[team == x,])
})
> DT
unique_point biased data_points team groupID A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
1: up1 FALSE 3 A xy28352 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
2: up1 TRUE 4 A xy28352 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
3: up2 FALSE 1 A xy28352 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
4: up2 TRUE 0 X xy28352 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
5: up3 FALSE 12 Y xy28352 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
6: up3 TRUE 35 Z xy28352 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
This is an unusual solution but it works. I used dplyr
and tidyr
DT[, counts := .N, by=c("team")]
x <- data.frame(team = sample(LETTERS,26))%>%arrange(team)
y <- DT%>%select(team,counts)%>%unique()
df <- x%>%left_join(y,"team")%>%spread(team, counts,fill = 0)
cbind(DT,df)
Note: the left_join does throw a warning message but doesn't tamper with the output and there's a work around dplyr join warning: joining factors with different levels
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.