简体   繁体   中英

Function to query different time series in R with vector as input

I was getting the min a max dates for an specific time series within a database fact table as follows:

auxiliar.dates <- function(machine, signal) {
  q.Aux1 <- paste("SELECT
         t1.machine,
       t1.signal,
       t2.signal_name,
       t1.min_snsr_dt,
       t1.max_snsr_dt,
       t1.min_snsr_ts,
       t1.max_snsr_ts,
       t1.min_etl_dt,
       t1.max_etl_dt,
       t1.rec_cnt
       FROM ", config$SF_CONFIG$my_schema_name1, ".mytable1 AS t1 
       LEFT JOIN ", config$SF_CONFIG$my_schema_name1, ".mytable2", "AS t2
       ON t1.signal=t2.signal 
       WHERE t1.unit_key=")
  q.Aux2 <- " AND t1.signal="
  q.Aux.final <- str_c(q.Aux1, machine, q.Aux2, signal)
  res <- dbSendQuery(myConn, q.Aux.final)
  df <- as.data.table(dbFetch(res, n=-1))
  dbClearResult(res)
  return(df)
}

dates <-auxiliar.dates("machine", "signal")

The output of this functions is a data table as follows:

在此处输入图片说明

Then I was using the output to query the specific signal between min and max ts as follows:

signalQuery <- function(machine, signal, min_ts, max_ts) {

  q1.aux1 <- paste("SELECT snsr_val, 
                      snsr_ts, 
                      snsr_dt, 
                      signal,
                      qual, 
                      machine 
                      FROM ", config$SF_CONFIG$schema_name1, 
                     ".mytable1 AS v
                      WHERE machine=", sep="")

  q3.aux1 <-paste(" AND signal=", signal, " AND snsr_ts BETWEEN ", "'", min_ts, "'",
                    " AND ", "'", max_ts, "'", " ORDER BY v.snsr_ts", sep = "")

  qt.auxtotal <- str_c(q1.aux1,
                     machine,
                     q3.aux1) #we join que full query with stringr library

  res <- dbSendQuery(myConn, qt.auxtotal)
  df <- as.data.table(dbFetch(res,n=-1))
  dbClearResult(res) #cleaning memory
  return(df)
}

To call signal 71 for instance I was doing:

    signal71.dates <- auxiliar.dates(machine, 71)
    df   <- signalQuery(machine, 71, signal71.dates$min_snsr_dt, signal71.dates$max_snsr_dt)

In case that I need to query more signals I was doing exactly the same procedure but I was taking the min value of max_snsr_dt of my dataframes call signal_number.dates and the max value of the min_snsr_dt of my dataframes signal_number.dates.

I would like know to change a bit the process and being able to input a vector which the signals that I want in both auxiliar.dates and signalQuery function.

My first trial was to modify auxiliar.dates:

q.Aux2 <- " AND t1.signal="

to:

q.Aux2 <- " AND t1.signal IN ("
q.Aux.final <- str_c(q.Aux1, machine, q.Aux2, paste(signal, ")", sep = ""))

However when I call the function as:

test <- auxiliar.dates(984, c(70,71))

I get the following error:

Error in new_result(connection@ptr, statement) : Expecting a single string value: [type=character; extent=2].

Will someone be able to support?

BR

Consider the following changes:

  • Parameterization : Avoid too many string concatenation that impairs readability and maintainability. Instead use parameterization which is supported in DBI + odbc withsqlInterpolate . Ideally, you would hard code the table names in the SQL string statement but since identifiers cannot be parameterized, paste (or paste0 for no spaces between) will still have to be used.

  • Single SQL query : Combine the two SQL queries using a Common Table Expression (CTE) which is supported in Snowflake. Specifically, first query is joined to last query by machine and signal and date BETWEEN interval. In turn, you combine both functions, reduce number of database trips, and avoid intermediate, helper objects.

  • Use dbGetQuery : If data load is not an issue with need to fetch large result sets by chunks, use dbGetQuery to combine the dbSendQuery and dbFetch steps for concision.

  • Function inputs : As @r2evans comments, avoid relying on environment variables of unknown parent sources to be situated inside a local function. Instead, pass all needed input parameters for local scoped variables.

  • Iteration : Because these functions use scalar parameters, you must iterate through values such as with lapply to run functions multiple times and then row bind results for final data table.

Single Function

signalQuery <- function(my_schema, machine, signal) { 
    # PREPARED STATEMENT 
    sql <- paste0("WITH sub AS 
                     (SELECT t1.machine, t1.signal,  t2.signal_name, 
                             t1.min_snsr_dt, t1.max_snsr_dt,
                             t1.min_snsr_ts, t1.max_snsr_ts, 
                             t1.min_etl_dt,  t1.max_etl_dt, t1.rec_cnt
                      FROM ", my_schema, ".mytable1 AS t1 
                      LEFT JOIN ", my_schema, ".mytable2", "AS t2
                          ON t1.signal = t2.signal 
                      WHERE t1.unit_key = ?m_param AND t1.signal= ?s_param)

                   SELECT v.snsr_val, v.snsr_ts, v.snsr_dt, v.signal, 
                          v.qual, v.machine 
                   FROM ", my_schema, ".mytable1 AS v
                   INNER JOIN sub
                     ON v.machine = sub.machine
                     AND v.signal = sub.signal
                     AND v.snsr_ts BETWEEN sub.min_snsr_dt AND sub.max_snsr_dt
                   ORDER BY v.snsr_ts")

    # BIND PARAMS TO ?MARK PLACEHOLDERS
    query <- sqlInterpolate(conn, sql, m_param = machine, s_param = signal)

    # RUN QUERY
    dt <- as.data.table(dbGetQuery(myConn, query))

    return(dt)    
}

Function Calls

# SINGLE SIGNAL VALUE
q.Aux.final <- signalQuery(myschema = config$SF_CONFIG$my_schema_name1,
                           machine = 984, signal = 70)

# MULTIPLE SIGNAL VALUES
dt_list <- lapply(c(70,71), function(i) 
                    signalQuery(myschema = config$SF_CONFIG$my_schema_name1,
                                machine = 984, signal = i)
           )

q.Aux.final <- data.table::rbindlist(dt_list)

Multiple Functions

In case, you do need the first resultset for analytical needs, continue with same process without CTE:

auxiliar.dates <- function(my_schema, machine, signal) { 

    sql <- paste0("SELECT t1.machine, t1.signal,  t2.signal_name, 
                          t1.min_snsr_dt, t1.max_snsr_dt,
                          t1.min_snsr_ts, t1.max_snsr_ts, 
                          t1.min_etl_dt,  t1.max_etl_dt, t1.rec_cnt
                   FROM ", my_schema, ".mytable1 AS t1 
                   LEFT JOIN ", my_schema, ".mytable2", "AS t2
                         ON t1.signal=t2.signal 
                   WHERE t1.unit_key = ?m_param AND t1.signal= ?s_param")

    query <- sqlInterpolate(conn, sql, m_param = machine, s_param = signal)
    dt <- as.data.table(dbGetQuery(myConn, query))

    return(dt)    
}


signalQuery <- function(my_schema, machine, signal, min_ts, max_ts) {

    sql <- paste0("SELECT v.snsr_val, v.snsr_ts, v.snsr_dt, v.signal, 
                          v.qual, v.machine 
                   FROM ", my_schema, ".mytable1 AS v
                   WHERE v.machine = ?m_param
                     AND v.signal = ?s_param
                     AND v.snsr_ts BETWEEN ?min_ts_prm AND ?max_ts_prm
                   ORDER BY v.snsr_ts")

    query <- sqlInterpolate(conn, sql, m_param = machine, s_param = signal,
                            min_ts_prm = min_ts, max_ts_prm = max_ts)
    dt <- as.data.table(dbGetQuery(myConn, query))

    return(dt)    
}

Function Calls

# SINGLE SIGNAL VALUE
signal71.dates <- auxiliar.dates(config$SF_CONFIG$my_schema_name1, 984, 71)

q.Aux.final <- signalQuery(config$SF_CONFIG$my_schema_name1, 984, 71,
                           signal71.dates$min_snsr_dt, signal71.dates$max_snsr_dt)

# MULTIPLE SIGNAL VALUES
dt_list <- lapply(c(70,71), function(i) 
                    signalQuery(myschema = config$SF_CONFIG$my_schema_name1,
                                machine = 984, signal = i)
           )

signal.dates_dt <- data.table::rbindlist(dt_list)


dt_list <- lapply(1:nrow(signal.dates_dt), function(i) 
                    signalQuery(myschema = config$SF_CONFIG$my_schema_name1,
                                machine  = signal.dates_dt$machine[i], 
                                signal   = signal.dates_dt$signal[i],
                                min_ts   = signal.dates$min_snsr_dt[i],
                                max_ts   = signal.dates$max_snsr_dt[i])
           )

q.Aux.final <- data.table::rbindlist(dt_list)

Update: error solved, the connector expired I need it to connect again

Really appreciate your solution. However I am getting and error whenever as use as input two schemas.

auxiliar.dates <- function(connection, my_schema1, my_schema2, machine, signal) { 

  sql <- paste0("SELECT t1.machine, t1.signal,  t2.signal_name, 
                          t1.min_snsr_dt, t1.max_snsr_dt,
                          t1.min_snsr_ts, t1.max_snsr_ts, 
                          t1.min_etl_dt,  t1.max_etl_dt, t1.rec_cnt
                   FROM ", my_schema1, ".table1 AS t1 
                   LEFT JOIN ", my_schema2, ".table2", " AS t2
                         ON t1.snsr_key = t2.snsr_key
                   WHERE t1.machine = ?m_param AND t1.signal = ?s_param")

  query <- sqlInterpolate(connection, sql, m_param = machine, s_param = signal)
  dt <- as.data.table(dbGetQuery(connection, query))

  return(dt)    
}`

However I get the follwing error:

 signal1.dates <- auxiliar.dates(myConn, config$SF_CONFIG$my_schema1, config$SF_CONFIG$my_schema2, machine.number, signal.number)
 Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘sqlInterpolate’ for signature ‘"Snowflake"’ 

Do you know why this is happening? When I try with only one input and not specifying the connection as part of the function it just works fine.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM