简体   繁体   中英

Use one function for multiple datasets in R

I generated a function like this to check one variable of a dataset 'a':

d <- function(x)
  a <- sort(levels(as.factor(x)),decreasing=T)[1:3]
  for (i in 1:length(a))
    if (any(table(x[i])==a[i])<600)
      returnlist <- paste(" Month(s) having less data is/are ", x[i])
    else {
      return(print(" All the recent three months have good enough data "))


Now I have three more datasets to be checked. How can I write a function that takes all the 4 datasets at once and give their respective results? Do I have to use these 4 datasets as arguments? Also suggest me how to write the return statements giving heading as respective dataset name and below that I need the results of that dataset.

My variable that I passed into the function looks like this:


These months are the months in which responders have taken the surveys along with the year. So there are many responders in each month.

@Frank..Thank you for the above lapply function. It worked but I am getting only the first record of each dataset. 
My output is looking like this for now-
1  Month(s) having less data is/are  201604
2  Month(s) having less data is/are  201604
3  Month(s) having less data is/are  201604
4  Month(s) having less data is/are  201604

  For example: If my a , b,c,d datasets have yearmonth values as-

201604 201603 201602
34  652 643


201604 201603 201602
678 78  98

201604 201603 201602
675 897 678

201604 201603 201602
566 788 90

So here my function should give output for counts<600 of each dataset.
201603 201602
78             98

I don’t think my function is checking all the three values of ‘a’ of each argument. How should it be solved?
And also how should I get the counts also to be displayed in the output? How can I get argument name in the return statement so that I can relate my output to that dataset?

expansion of Richard Scriven 's comment using your d function:

lapply(list(A$yearmonth, B$yearmonth, C$yearmonth, D$yearmonth), d)

going further, here's a different way to construct the d function to produce the output you have in mind:

d <- function(df)
  a <- sort(levels(as.factor(df$yearmonth)),decreasing=T)[1:3]
  b <- as.data.frame(table(df[df$yearmonth %in% a,]))
  c <- b[b$Freq < 600,]$Var1
    print(paste("Month(s) having less data is/are", paste(c$Var1, collapse=', ')))
  else {
    print(" All the recent three months have good enough data ")

lapply(list(A, B, C, D), d)

As @Richard Scriven suggested...

Load your data frames into your workspace and run

mydf.list <- lapply(ls(), function(x) if (class(get(x)) == "data.frame")
              get(x)) # create a list of all the data frames in workspace

# apply your function on the list of dataframe, this will return list
my.results <- lapply(mydf.list,d) 

# to get back the results as data frame
data.frame(Reduce(rbind, my.results)) 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM