I'm trying to use the following R data.table to create multiple columns out of the "Ref" field:
library(data.table)
(dt= data.table(Ref = c("R", "STOP", "STOP_TS", "P", "M", "STOP_P_R"),
Qty= c(2,4,6,8,10,12)))
The new columns should be based on single ref only (eg "STOP" and "TS) as opposed to combined ref (eg "STOP_TS"). Once a single ref is identified by using "_" separator, the new column should take the value of the "Qty" field, otherwise it should be zero. The desired output should look like this:
#Desired Output
(desired=data.table(
Ref= c("R", "STOP", "STOP_TS", "P", "M", "STOP_P_R"),
Qty= c(2,4,6,8,10,12),
R = c(2,0,0,0,0,12),
STOP= c (0,4,6,0,0,12),
TS= c(0,0,6,0,0,0),
P= c(0,0,0,8,0,12),
M=c(0,0,0,0,10,0)))
The problem I have with my approach is that the regex part wrongly matched "P" when looking at "STOP", since it doesn't specify to match for complete 'words'.
library(foreach)
library(data.table)
ref<-unlist(unique(dt$Ref)) #extract unique combined ref
ref2<-strsplit(ref, "_") #split ref by using "_"
ref3<-unique(unlist(ref2)) #extract unique single ref (columns to create)
dt2<-foreach(i=1:length(ref3), .combine='cbind')%do%{
eval(parse(text=paste0("tmp<-ifelse( grepl(ref3[i], dt$Ref), dt$Qty,0)")))
data.table(tmp)
}
names(dt2)<-ref3
(dt3=cbind(dt,dt2))
As a way to check, the sum of column "P" should be 20 (8 for Ref="P" and 12 for Ref="STOP_P_R").
I'd appreciate any comments or suggestions on this.
dl
An option is to split the column with separate_rows
and then reshape it to wide format with pivot_wider
, and bind the original dataset with bind_cols
library(dplyr)
library(tidyr)
dt %>%
mutate(rn = row_number()) %>%
separate_rows(Ref) %>%
pivot_wider(names_from = Ref, values_from = Qty,
values_fill = list(Qty = 0)) %>%
select(-rn) %>%
bind_cols(dt, .)
# Ref Qty R STOP TS P M
#1: R 2 2 0 0 0 0
#2: STOP 4 0 4 0 0 0
#3: STOP_TS 6 0 6 6 0 0
#4: P 8 0 0 0 8 0
#5: M 10 0 0 0 0 10
#6: STOP_P_R 12 12 12 0 12 0
Or using dcast
from data.table
library(splitstackshape)
library(data.table)
cbind(dt, dcast(cSplit(dt[, rn := seq_len(.N)], 'Ref', '_', "long"),
rn ~ Ref, value.var = 'Qty', fill = 0)[, rn := NULL])
We can use cSplit_e
from splitstackshape
to get data in binary format for each row separating on "_"
. We can then replace all the 1's with the corresponding Qty
value.
data <- data.frame(splitstackshape::cSplit_e(dt, "Ref", sep = "_",
type = "character", fill = 0))
cols <- grep('Ref_', names(data))
mat <- which(data[cols] == 1, arr.ind = TRUE)
data[cols][mat] <- data$Qty[mat[, 1]]
data
# Ref Qty Ref_M Ref_P Ref_R Ref_STOP Ref_TS
#1 R 2 0 0 2 0 0
#2 STOP 4 0 0 0 4 0
#3 STOP_TS 6 0 0 0 6 6
#4 P 8 0 8 0 0 0
#5 M 10 10 0 0 0 0
#6 STOP_P_R 12 0 12 12 12 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.