I generated a function like this to check one variable of a dataset 'a':
d <- function(x)
{
a <- sort(levels(as.factor(x)),decreasing=T)[1:3]
for (i in 1:length(a))
{
if (any(table(x[i])==a[i])<600)
{
returnlist <- paste(" Month(s) having less data is/are ", x[i])
return(returnlist)
}
else {
return(print(" All the recent three months have good enough data "))
}
}
}
d(a$YEARMONTH)
Now I have three more datasets to be checked. How can I write a function that takes all the 4 datasets at once and give their respective results? Do I have to use these 4 datasets as arguments? Also suggest me how to write the return statements giving heading as respective dataset name and below that I need the results of that dataset.
My variable that I passed into the function looks like this:
Apr-2014
Apr-2015
Apr-2016
Aug-2013
Aug-2014
Aug-2015
Dec-2013
Dec-2014
Dec-2015
Feb-2014....
These months are the months in which responders have taken the surveys along with the year. So there are many responders in each month.
@Frank..Thank you for the above lapply function. It worked but I am getting only the first record of each dataset.
My output is looking like this for now-
1 Month(s) having less data is/are 201604
2 Month(s) having less data is/are 201604
3 Month(s) having less data is/are 201604
4 Month(s) having less data is/are 201604
For example: If my a , b,c,d datasets have yearmonth values as-
A$yearmonth
201604 201603 201602
34 652 643
B$yearmonth
201604 201603 201602
678 78 98
C$yearmonth
201604 201603 201602
675 897 678
D$yearmonth
201604 201603 201602
566 788 90
So here my function should give output for counts<600 of each dataset.
A$yearmonth
2016
34
B$yeamonth
201603 201602
78 98
D$yearmonth
201602
90
I don’t think my function is checking all the three values of ‘a’ of each argument. How should it be solved?
And also how should I get the counts also to be displayed in the output? How can I get argument name in the return statement so that I can relate my output to that dataset?
expansion of Richard Scriven 's comment using your d
function:
lapply(list(A$yearmonth, B$yearmonth, C$yearmonth, D$yearmonth), d)
going further, here's a different way to construct the d function to produce the output you have in mind:
d <- function(df)
{
a <- sort(levels(as.factor(df$yearmonth)),decreasing=T)[1:3]
b <- as.data.frame(table(df[df$yearmonth %in% a,]))
c <- b[b$Freq < 600,]$Var1
if(length(c)>0){
print(paste("Month(s) having less data is/are", paste(c$Var1, collapse=', ')))
else {
print(" All the recent three months have good enough data ")
}
}
lapply(list(A, B, C, D), d)
As @Richard Scriven suggested...
Load your data frames into your workspace and run
mydf.list <- lapply(ls(), function(x) if (class(get(x)) == "data.frame")
get(x)) # create a list of all the data frames in workspace
# apply your function on the list of dataframe, this will return list
my.results <- lapply(mydf.list,d)
# to get back the results as data frame
data.frame(Reduce(rbind, my.results))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.