I was getting the min a max dates for an specific time series within a database fact table as follows:
auxiliar.dates <- function(machine, signal) {
q.Aux1 <- paste("SELECT
t1.machine,
t1.signal,
t2.signal_name,
t1.min_snsr_dt,
t1.max_snsr_dt,
t1.min_snsr_ts,
t1.max_snsr_ts,
t1.min_etl_dt,
t1.max_etl_dt,
t1.rec_cnt
FROM ", config$SF_CONFIG$my_schema_name1, ".mytable1 AS t1
LEFT JOIN ", config$SF_CONFIG$my_schema_name1, ".mytable2", "AS t2
ON t1.signal=t2.signal
WHERE t1.unit_key=")
q.Aux2 <- " AND t1.signal="
q.Aux.final <- str_c(q.Aux1, machine, q.Aux2, signal)
res <- dbSendQuery(myConn, q.Aux.final)
df <- as.data.table(dbFetch(res, n=-1))
dbClearResult(res)
return(df)
}
dates <-auxiliar.dates("machine", "signal")
The output of this functions is a data table as follows:
Then I was using the output to query the specific signal between min and max ts as follows:
signalQuery <- function(machine, signal, min_ts, max_ts) {
q1.aux1 <- paste("SELECT snsr_val,
snsr_ts,
snsr_dt,
signal,
qual,
machine
FROM ", config$SF_CONFIG$schema_name1,
".mytable1 AS v
WHERE machine=", sep="")
q3.aux1 <-paste(" AND signal=", signal, " AND snsr_ts BETWEEN ", "'", min_ts, "'",
" AND ", "'", max_ts, "'", " ORDER BY v.snsr_ts", sep = "")
qt.auxtotal <- str_c(q1.aux1,
machine,
q3.aux1) #we join que full query with stringr library
res <- dbSendQuery(myConn, qt.auxtotal)
df <- as.data.table(dbFetch(res,n=-1))
dbClearResult(res) #cleaning memory
return(df)
}
To call signal 71 for instance I was doing:
signal71.dates <- auxiliar.dates(machine, 71)
df <- signalQuery(machine, 71, signal71.dates$min_snsr_dt, signal71.dates$max_snsr_dt)
In case that I need to query more signals I was doing exactly the same procedure but I was taking the min value of max_snsr_dt of my dataframes call signal_number.dates and the max value of the min_snsr_dt of my dataframes signal_number.dates.
I would like know to change a bit the process and being able to input a vector which the signals that I want in both auxiliar.dates and signalQuery function.
My first trial was to modify auxiliar.dates:
q.Aux2 <- " AND t1.signal="
to:
q.Aux2 <- " AND t1.signal IN ("
q.Aux.final <- str_c(q.Aux1, machine, q.Aux2, paste(signal, ")", sep = ""))
However when I call the function as:
test <- auxiliar.dates(984, c(70,71))
I get the following error:
Error in new_result(connection@ptr, statement) : Expecting a single string value: [type=character; extent=2].
Will someone be able to support?
BR
Consider the following changes:
Parameterization : Avoid too many string concatenation that impairs readability and maintainability. Instead use parameterization which is supported in DBI
+ odbc
withsqlInterpolate
. Ideally, you would hard code the table names in the SQL string statement but since identifiers cannot be parameterized, paste
(or paste0
for no spaces between) will still have to be used.
Single SQL query : Combine the two SQL queries using a Common Table Expression (CTE) which is supported in Snowflake. Specifically, first query is joined to last query by machine and signal and date BETWEEN
interval. In turn, you combine both functions, reduce number of database trips, and avoid intermediate, helper objects.
Use dbGetQuery
: If data load is not an issue with need to fetch large result sets by chunks, use dbGetQuery
to combine the dbSendQuery
and dbFetch
steps for concision.
Function inputs : As @r2evans comments, avoid relying on environment variables of unknown parent sources to be situated inside a local function. Instead, pass all needed input parameters for local scoped variables.
Iteration : Because these functions use scalar parameters, you must iterate through values such as with lapply
to run functions multiple times and then row bind results for final data table.
Single Function
signalQuery <- function(my_schema, machine, signal) {
# PREPARED STATEMENT
sql <- paste0("WITH sub AS
(SELECT t1.machine, t1.signal, t2.signal_name,
t1.min_snsr_dt, t1.max_snsr_dt,
t1.min_snsr_ts, t1.max_snsr_ts,
t1.min_etl_dt, t1.max_etl_dt, t1.rec_cnt
FROM ", my_schema, ".mytable1 AS t1
LEFT JOIN ", my_schema, ".mytable2", "AS t2
ON t1.signal = t2.signal
WHERE t1.unit_key = ?m_param AND t1.signal= ?s_param)
SELECT v.snsr_val, v.snsr_ts, v.snsr_dt, v.signal,
v.qual, v.machine
FROM ", my_schema, ".mytable1 AS v
INNER JOIN sub
ON v.machine = sub.machine
AND v.signal = sub.signal
AND v.snsr_ts BETWEEN sub.min_snsr_dt AND sub.max_snsr_dt
ORDER BY v.snsr_ts")
# BIND PARAMS TO ?MARK PLACEHOLDERS
query <- sqlInterpolate(conn, sql, m_param = machine, s_param = signal)
# RUN QUERY
dt <- as.data.table(dbGetQuery(myConn, query))
return(dt)
}
Function Calls
# SINGLE SIGNAL VALUE
q.Aux.final <- signalQuery(myschema = config$SF_CONFIG$my_schema_name1,
machine = 984, signal = 70)
# MULTIPLE SIGNAL VALUES
dt_list <- lapply(c(70,71), function(i)
signalQuery(myschema = config$SF_CONFIG$my_schema_name1,
machine = 984, signal = i)
)
q.Aux.final <- data.table::rbindlist(dt_list)
Multiple Functions
In case, you do need the first resultset for analytical needs, continue with same process without CTE:
auxiliar.dates <- function(my_schema, machine, signal) {
sql <- paste0("SELECT t1.machine, t1.signal, t2.signal_name,
t1.min_snsr_dt, t1.max_snsr_dt,
t1.min_snsr_ts, t1.max_snsr_ts,
t1.min_etl_dt, t1.max_etl_dt, t1.rec_cnt
FROM ", my_schema, ".mytable1 AS t1
LEFT JOIN ", my_schema, ".mytable2", "AS t2
ON t1.signal=t2.signal
WHERE t1.unit_key = ?m_param AND t1.signal= ?s_param")
query <- sqlInterpolate(conn, sql, m_param = machine, s_param = signal)
dt <- as.data.table(dbGetQuery(myConn, query))
return(dt)
}
signalQuery <- function(my_schema, machine, signal, min_ts, max_ts) {
sql <- paste0("SELECT v.snsr_val, v.snsr_ts, v.snsr_dt, v.signal,
v.qual, v.machine
FROM ", my_schema, ".mytable1 AS v
WHERE v.machine = ?m_param
AND v.signal = ?s_param
AND v.snsr_ts BETWEEN ?min_ts_prm AND ?max_ts_prm
ORDER BY v.snsr_ts")
query <- sqlInterpolate(conn, sql, m_param = machine, s_param = signal,
min_ts_prm = min_ts, max_ts_prm = max_ts)
dt <- as.data.table(dbGetQuery(myConn, query))
return(dt)
}
Function Calls
# SINGLE SIGNAL VALUE
signal71.dates <- auxiliar.dates(config$SF_CONFIG$my_schema_name1, 984, 71)
q.Aux.final <- signalQuery(config$SF_CONFIG$my_schema_name1, 984, 71,
signal71.dates$min_snsr_dt, signal71.dates$max_snsr_dt)
# MULTIPLE SIGNAL VALUES
dt_list <- lapply(c(70,71), function(i)
signalQuery(myschema = config$SF_CONFIG$my_schema_name1,
machine = 984, signal = i)
)
signal.dates_dt <- data.table::rbindlist(dt_list)
dt_list <- lapply(1:nrow(signal.dates_dt), function(i)
signalQuery(myschema = config$SF_CONFIG$my_schema_name1,
machine = signal.dates_dt$machine[i],
signal = signal.dates_dt$signal[i],
min_ts = signal.dates$min_snsr_dt[i],
max_ts = signal.dates$max_snsr_dt[i])
)
q.Aux.final <- data.table::rbindlist(dt_list)
Update: error solved, the connector expired I need it to connect again
Really appreciate your solution. However I am getting and error whenever as use as input two schemas.
auxiliar.dates <- function(connection, my_schema1, my_schema2, machine, signal) {
sql <- paste0("SELECT t1.machine, t1.signal, t2.signal_name,
t1.min_snsr_dt, t1.max_snsr_dt,
t1.min_snsr_ts, t1.max_snsr_ts,
t1.min_etl_dt, t1.max_etl_dt, t1.rec_cnt
FROM ", my_schema1, ".table1 AS t1
LEFT JOIN ", my_schema2, ".table2", " AS t2
ON t1.snsr_key = t2.snsr_key
WHERE t1.machine = ?m_param AND t1.signal = ?s_param")
query <- sqlInterpolate(connection, sql, m_param = machine, s_param = signal)
dt <- as.data.table(dbGetQuery(connection, query))
return(dt)
}`
However I get the follwing error:
signal1.dates <- auxiliar.dates(myConn, config$SF_CONFIG$my_schema1, config$SF_CONFIG$my_schema2, machine.number, signal.number)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘sqlInterpolate’ for signature ‘"Snowflake"’
Do you know why this is happening? When I try with only one input and not specifying the connection as part of the function it just works fine.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.