简体   繁体   中英

Downloading new data from internet when package is loaded every time

I have a package that scrapes data from the internet and displays its content based on the function call. But recently I got a message from CRAN that the data becomes stale when Binary build is installed (since the function was mentioned in utils.R and it has downloaded while the build).

For the past few days, I've tried the following but no success:

  • Global Variable using <<- but it generates a CRAN note and I also went through a few answers which advised against the approach Note: no visible binding for global variable
  • Create a new environment and then add this downloaded object in that, but it never worked out since I couldn't access the object in other functions. Ref: Where to create package environment variables?

This is the current package files: https://github.com/amrrs/tiobeindexr/tree/master/R

Tried solution:

zzz.r file:

.onLoad <- function (libname, pkgname)
{

  assign("newEnv", new.env(hash = TRUE, parent = parent.frame()))

  newEnv$.all_tablesx789  <- rvest::html_table(xml2::read_html('https://www.tiobe.com/tiobe-index/'))


}

one of the functions in the core code.

hall_of_fame <- function() {

  #check_data()

  #.GlobalEnv$.all_tablesx789 <- check_data()

  newEnv$.all_tablesx789[[4]]

}

The package builds fine, but the object is not found. Error below:

Error in hall_of_fame() : object 'newEnv' not found

I've only a couple of days to save my package on CRAN and I hope I've provided enough data from saving this question being downloaded.

Thanks!

Consider adding memoise as a dependency so you can get in-session caching for free with a minimal dependency chain then using a package environment and (just for fun) an active binding.

Create new 📦 env (you can stick this in, say, aaa.R ):

.pkgenv <- new.env(parent=emptyenv())

Now, (say, in zzz.R ) setup one function that does the table grabbing:

.get_tiboe_tables <- function(url) {
  message("Delete this since it's just to show caching works") # delete this
  content <- xml2::read_html(url)
  rvest::html_table(content)
}

And "memoise" it (again, in zzz.R ):

get_tiboe_tables <- memoise::memoise(.get_tiboe_tables)

Now, create an active binding which will let us access the tables like a variable (ie w/o the () ). It's more "fun" than necessary (again, in zzz.R ):

makeActiveBinding(
  sym = "all_tables",
  fun = function() get_tiboe_tables('https://www.tiobe.com/tiobe-index/'),
  env = .pkgenv
)

Now, get the value like this (notice we get the "loading" message as it "primes" the cache:

str(.pkgenv$all_tables, 1)
## Delete this since it's just to show caching works ** the loading msg
## List of 4
##  $ :'data.frame':    20 obs. of  6 variables:
##  $ :'data.frame':    30 obs. of  3 variables:
##  $ :'data.frame':    15 obs. of  8 variables:
##  $ :'data.frame':    15 obs. of  2 variables:

On subsequent calls there is no loading message since it's retrieving the cached value:

str(.pkgenv$all_tables, 1)
## List of 4
##  $ :'data.frame':    20 obs. of  6 variables:
##  $ :'data.frame':    30 obs. of  3 variables:
##  $ :'data.frame':    15 obs. of  8 variables:
##  $ :'data.frame':    15 obs. of  2 variables:

On the next R session it will refresh the tables. That way, there's fresh data without abusing the site. You can use file collation instead of sorted-name hacking as well.

Note that you can export the active binding as well and your 📦 users can then use it like a variable instead of calling it like a function.

Actually, I took a slightly different approach from the above answer . This is in reference with Thomas' comment and the reason is I didn't want to add memoise as a dependency and tried an alternative.

Creating a new package in aaa.R :

.pkgenv <- new.env(parent=emptyenv())

Loading data into the tables within the environment using .onAttach() in zzz.R

.onAttach <- function(libname, pkgname) {

  packageStartupMessage("Downloading TIOBE Index Data using your Internet...")

  tryCatch({
    .pkgenv$.get_tiboe_tables <- rvest::html_table(xml2::read_html("https://www.tiobe.com/tiobe-index/"))
  },
  error = function(e){
    packageStartupMessage("Downloading TIOBE Index data failed!")
    packageStartupMessage("Error Message:")
    packageStartupMessage(e)
    return(NA)
  })

}

My earlier mistakes seems that I was trying to create the new enviroment inside .onLoad() itself.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM