I have a data.table
where the data I want is structured in a diagonal fashion.
library(data.table)
month <- c(201406, 201406, 201406, 201406, 201406, 201406, 201406, 201406,
201406, 201406, 201406, 201406)
code <- c("498A01", "498A01", "498A01", "498A01", "498A01", "498A01", "498A01", "498A01",
"498A01", "498A01", "498A01", "498A01")
col.a <- c("service", "base charge", "", "", "", "", "", "", "", "", "", "")
col.b <- c("", "", "description", "per unit", "", "", "", "", "", "", "", "")
col.c <- c("", "", "", "", "rate", 6859, "", "", "", "", "", "")
col.d <- c("", "", "", "", "", "", "quantity", 1, "", "", "", "")
col.e <- c("", "", "", "", "", "", "", "", "total charge", 6859, "", "")
col.f <- c("", "", "", "", "", "", "", "", "", "", "", "")
dt <- data.table(month, code, col.a, col.b, col.c, col.d, col.e, col.f)
However, I need to organize the data in a more coherent fashion to simplify dt
I am fairly new to data.table
and I was wondering if there was a straightforward way to do so.
For col.a
I know the following works for one column:
dt <- dt[col.a != "", 1:8, by = .(code, month)
But when I try for multiple columns it returns a data table with 0 obs. I suppose I could do that for all of the columns and then do some kind of merge but that seems inefficient and cumbersome. Is there a better way?
My desired output is:
month code col.a col.b col.c col.d col.e col.f
1: 201406 498A01 service description rate quantity total charge
2: 201406 498A01 base charge per unit 6859 1 6859
So for each unique combination of code
and month
I want to remove the empty cells and collapse the data to look like it does above. I need to keep the col.f1
because it may not always be blank.
Any suggestions would be greatly appreciated.
Are you looking for something like
dt[, lapply(.SD, function(x) x[x!=""][1:2]), by=.(month, code)]
output:
month code col.a col.b col.c col.d col.e col.f
1: 201406 498A01 service description rate quantity total charge <NA>
2: 201406 498A01 base charge per unit 6859 1 6859 <NA>
Or in base R:
do.call(rbind, by(dt, paste(dt$month, dt$code),
function(y) do.call(cbind, lapply(y, function(x) x[x!=""][1:2]))))
output:
month code col.a col.b col.c col.d col.e col.f
[1,] "201406" "498A01" "service" "description" "rate" "quantity" "total charge" NA
[2,] "201406" "498A01" "base charge" "per unit" "6859" "1" "6859" NA
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.