I have the df1 data
df1 <- data.frame(id=c("A","A","A","A","B","B","B","B"),
year=c(2014,2014,2015,2015),
month=c(1,2),
new.employee=c(4,6,2,6,23,2,5,34))
id year month new.employee
1 A 2014 1 4
2 A 2014 2 6
3 A 2015 1 2
4 A 2015 2 6
5 B 2014 1 23
6 B 2014 2 2
7 B 2015 1 5
8 B 2015 2 34
and desired outcome with following functions:
library(data.table) # V1.9.6+
temp <- setDT(df1)[month == 2L, .(id, frank(-new.employee)), by = year]
df1[temp, new.employee.rank := i.V2, on = c("year", "id")]
df1
# id year month new.employee new.employee.rank
# 1: A 2014 1 4 1
# 2: A 2014 2 6 1
# 3: A 2015 1 2 2
# 4: A 2015 2 6 2
# 5: B 2014 1 23 2
# 6: B 2014 2 2 2
# 7: B 2015 1 5 1
# 8: B 2015 2 34 1
Now, I want to datamining by creating a user-defined function to varying the input, which is new.employee in above example. I tried some ways but they did not work:
the first try:
myRank <- function(data,var) { temp <- setDT(data)[month == 2L, .(id, frank(-var)), by = year] data[temp, new.employee.rank := i.V2, on = c("year", "id")] return(data) } myRank(df1,new.employee)
Error in is.data.frame(x) : object 'new.employee' not found
the second try:
myRank(df1,df1$new.employee)
nothing appeared
The third try: I change the function a bit
myRank <- function(data,var) { temp <- setDT(data)[month == 2L, .(id, rank(data$var)), by = year] data[temp, new.employee.rank := i.V2, on = c("year", "id")] return(data) }
myRank(df1,df1$new.employee) Warning messages: 1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' 2: In
[.data.table
(setDT(data), month == 2L, .(id, rank(data$var)), : Item 2 of j's result for group 1 is zero length. This will be filled with 2 NAs to match the longest column in this result. Later groups may have a similar problem but only the first is reported to save filling the warning buffer. 3: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
I looked at similar problems but my R experience is not good enough to understand those.
data.table
uses a non standard evaluation by default (unless you start to mess around with with = FALSE
), and thus, you will need to refer to your column by name or alternatively use get
. Another problem with your code (as mentioned in comments) is that you are calling new.employee
while it's not defined outside of the scope of df1
. If you want prevent from R from evaluating it before you pass it to your data set, you could use the deparse(substitute(var))
combination which will prevent evaluation and then convert var
to a character string which can in turn be passed to get
or the eval(as.name())
combination (which do entirely different things but within the data.table
scope will lead to the same result). Finally, there is the printing issue after using :=
within the function. Even if everything works, return(data)
won't do anything, you will need to force printing either by using an additional []
or by explicitly calling print
Here's a possible solution
myRank <- function(data, var) {
var <- deparse(substitute(var)) ## <~~~ Note this
temp <- setDT(data)[month == 2L, .(id, frank(-get(var))), by = year] ## <~~ Note the get
data[temp, new.employee.rank := i.V2, on = c("year", "id")][] ## <~~ Note the []
}
myRank(df1, new.employee)
# id year month new.employee new.employee.rank
# 1: A 2014 1 4 1
# 2: A 2014 2 6 1
# 3: A 2015 1 2 2
# 4: A 2015 2 6 2
# 5: B 2014 1 23 2
# 6: B 2014 2 2 2
# 7: B 2015 1 5 1
# 8: B 2015 2 34 1
Or
myRank <- function(data, var) {
var <- as.name(deparse(substitute(var))) ## <~~~ Note additional as.name
temp <- setDT(data)[month == 2L, .(id, frank(-eval(var))), by = year] ## <~ Note the eval
data[temp, new.employee.rank := i.V2, on = c("year", "id")][]
}
myRank(df1, new.employee)
# id year month new.employee new.employee.rank
# 1: A 2014 1 4 1
# 2: A 2014 2 6 1
# 3: A 2015 1 2 2
# 4: A 2015 2 6 2
# 5: B 2014 1 23 2
# 6: B 2014 2 2 2
# 7: B 2015 1 5 1
# 8: B 2015 2 34 1
I would guess the second option will be faster as it avoids extracting the whole column out of data
As a side note, you could also make the creation of the new variables names interactive by replacing
new.employee.rank := i.V2
with something like
paste0("New.", var, ".rank") := i.V2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.