简体   繁体   English

以向量为输入查询 R 中不同时间序列的函数

[英]Function to query different time series in R with vector as input

I was getting the min a max dates for an specific time series within a database fact table as follows:我正在获取数据库事实表中特定时间序列的最小和最大日期,如下所示:

auxiliar.dates <- function(machine, signal) {
  q.Aux1 <- paste("SELECT
         t1.machine,
       t1.signal,
       t2.signal_name,
       t1.min_snsr_dt,
       t1.max_snsr_dt,
       t1.min_snsr_ts,
       t1.max_snsr_ts,
       t1.min_etl_dt,
       t1.max_etl_dt,
       t1.rec_cnt
       FROM ", config$SF_CONFIG$my_schema_name1, ".mytable1 AS t1 
       LEFT JOIN ", config$SF_CONFIG$my_schema_name1, ".mytable2", "AS t2
       ON t1.signal=t2.signal 
       WHERE t1.unit_key=")
  q.Aux2 <- " AND t1.signal="
  q.Aux.final <- str_c(q.Aux1, machine, q.Aux2, signal)
  res <- dbSendQuery(myConn, q.Aux.final)
  df <- as.data.table(dbFetch(res, n=-1))
  dbClearResult(res)
  return(df)
}

dates <-auxiliar.dates("machine", "signal")

The output of this functions is a data table as follows:此函数的输出是一个数据表,如下所示:

在此处输入图片说明

Then I was using the output to query the specific signal between min and max ts as follows:然后我使用输出来查询 min 和 max ts 之间的特定信号,如下所示:

signalQuery <- function(machine, signal, min_ts, max_ts) {

  q1.aux1 <- paste("SELECT snsr_val, 
                      snsr_ts, 
                      snsr_dt, 
                      signal,
                      qual, 
                      machine 
                      FROM ", config$SF_CONFIG$schema_name1, 
                     ".mytable1 AS v
                      WHERE machine=", sep="")

  q3.aux1 <-paste(" AND signal=", signal, " AND snsr_ts BETWEEN ", "'", min_ts, "'",
                    " AND ", "'", max_ts, "'", " ORDER BY v.snsr_ts", sep = "")

  qt.auxtotal <- str_c(q1.aux1,
                     machine,
                     q3.aux1) #we join que full query with stringr library

  res <- dbSendQuery(myConn, qt.auxtotal)
  df <- as.data.table(dbFetch(res,n=-1))
  dbClearResult(res) #cleaning memory
  return(df)
}

To call signal 71 for instance I was doing:例如,要调用信号 71,我正在做:

    signal71.dates <- auxiliar.dates(machine, 71)
    df   <- signalQuery(machine, 71, signal71.dates$min_snsr_dt, signal71.dates$max_snsr_dt)

In case that I need to query more signals I was doing exactly the same procedure but I was taking the min value of max_snsr_dt of my dataframes call signal_number.dates and the max value of the min_snsr_dt of my dataframes signal_number.dates.如果我需要查询更多信号,我正在执行完全相同的过程,但我正在获取我的数据帧调用 signal_number.dates 的 max_snsr_dt 的最小值和我的数据帧 signal_number.dates 的 min_snsr_dt 的最大值。

I would like know to change a bit the process and being able to input a vector which the signals that I want in both auxiliar.dates and signalQuery function.我想知道稍微改变一下过程并能够输入一个向量,该向量是我在 auxiliar.dates 和 signalQuery 函数中想要的信号。

My first trial was to modify auxiliar.dates:我的第一次尝试是修改 auxiliar.dates:

q.Aux2 <- " AND t1.signal="

to:到:

q.Aux2 <- " AND t1.signal IN ("
q.Aux.final <- str_c(q.Aux1, machine, q.Aux2, paste(signal, ")", sep = ""))

However when I call the function as:但是,当我将该函数调用为:

test <- auxiliar.dates(984, c(70,71))

I get the following error:我收到以下错误:

Error in new_result(connection@ptr, statement) : Expecting a single string value: [type=character; new_result(connection@ptr, statement) 中的错误:期望单个字符串值:[type=character; extent=2].范围=2]。

Will someone be able to support?有人能支持吗?

BR BR

Consider the following changes:考虑以下更改:

  • Parameterization : Avoid too many string concatenation that impairs readability and maintainability.参数化:避免过多的字符串连接,这会损害可读性和可维护性。 Instead use parameterization which is supported in DBI + odbc withsqlInterpolate .而是使用DBI + odbc支持的参数化和sqlInterpolate Ideally, you would hard code the table names in the SQL string statement but since identifiers cannot be parameterized, paste (or paste0 for no spaces between) will still have to be used.理想情况下,您会在 SQL 字符串语句中对表名进行硬编码,但由于标识符无法参数化,因此仍必须使用paste (或paste0没有空格)。

  • Single SQL query : Combine the two SQL queries using a Common Table Expression (CTE) which is supported in Snowflake.单个 SQL 查询:使用 Snowflake 支持的 公用表表达式 (CTE)组合两个 SQL 查询。 Specifically, first query is joined to last query by machine and signal and date BETWEEN interval.具体来说,第一个查询通过机器信号和日期BETWEEN间隔连接到最后一个查询。 In turn, you combine both functions, reduce number of database trips, and avoid intermediate, helper objects.反过来,您将这两个功能结合起来,减少数据库访问次数,并避免中间的辅助对象。

  • Use dbGetQuery : If data load is not an issue with need to fetch large result sets by chunks, use dbGetQuery to combine the dbSendQuery and dbFetch steps for concision.使用dbGetQuery :如果数据加载不是需要按块获取大型结果集的问题,请使用dbGetQuerydbSendQuerydbFetch步骤结合起来dbSendQuery简洁。

  • Function inputs : As @r2evans comments, avoid relying on environment variables of unknown parent sources to be situated inside a local function.函数输入:正如@r2evans 评论的那样,避免依赖位于本地函数内的未知父源的环境变量。 Instead, pass all needed input parameters for local scoped variables.相反,为局部作用域变量传递所有需要的输入参数。

  • Iteration : Because these functions use scalar parameters, you must iterate through values such as with lapply to run functions multiple times and then row bind results for final data table.迭代:因为这些函数使用标量参数,所以你必须迭代诸如lapply值来多次运行函数,然后为最终数据表行绑定结果。

Single Function单一功能

signalQuery <- function(my_schema, machine, signal) { 
    # PREPARED STATEMENT 
    sql <- paste0("WITH sub AS 
                     (SELECT t1.machine, t1.signal,  t2.signal_name, 
                             t1.min_snsr_dt, t1.max_snsr_dt,
                             t1.min_snsr_ts, t1.max_snsr_ts, 
                             t1.min_etl_dt,  t1.max_etl_dt, t1.rec_cnt
                      FROM ", my_schema, ".mytable1 AS t1 
                      LEFT JOIN ", my_schema, ".mytable2", "AS t2
                          ON t1.signal = t2.signal 
                      WHERE t1.unit_key = ?m_param AND t1.signal= ?s_param)

                   SELECT v.snsr_val, v.snsr_ts, v.snsr_dt, v.signal, 
                          v.qual, v.machine 
                   FROM ", my_schema, ".mytable1 AS v
                   INNER JOIN sub
                     ON v.machine = sub.machine
                     AND v.signal = sub.signal
                     AND v.snsr_ts BETWEEN sub.min_snsr_dt AND sub.max_snsr_dt
                   ORDER BY v.snsr_ts")

    # BIND PARAMS TO ?MARK PLACEHOLDERS
    query <- sqlInterpolate(conn, sql, m_param = machine, s_param = signal)

    # RUN QUERY
    dt <- as.data.table(dbGetQuery(myConn, query))

    return(dt)    
}

Function Calls函数调用

# SINGLE SIGNAL VALUE
q.Aux.final <- signalQuery(myschema = config$SF_CONFIG$my_schema_name1,
                           machine = 984, signal = 70)

# MULTIPLE SIGNAL VALUES
dt_list <- lapply(c(70,71), function(i) 
                    signalQuery(myschema = config$SF_CONFIG$my_schema_name1,
                                machine = 984, signal = i)
           )

q.Aux.final <- data.table::rbindlist(dt_list)

Multiple Functions多功能

In case, you do need the first resultset for analytical needs, continue with same process without CTE:如果您确实需要第一个结果集用于分析需求,请在没有 CTE 的情况下继续相同的过程:

auxiliar.dates <- function(my_schema, machine, signal) { 

    sql <- paste0("SELECT t1.machine, t1.signal,  t2.signal_name, 
                          t1.min_snsr_dt, t1.max_snsr_dt,
                          t1.min_snsr_ts, t1.max_snsr_ts, 
                          t1.min_etl_dt,  t1.max_etl_dt, t1.rec_cnt
                   FROM ", my_schema, ".mytable1 AS t1 
                   LEFT JOIN ", my_schema, ".mytable2", "AS t2
                         ON t1.signal=t2.signal 
                   WHERE t1.unit_key = ?m_param AND t1.signal= ?s_param")

    query <- sqlInterpolate(conn, sql, m_param = machine, s_param = signal)
    dt <- as.data.table(dbGetQuery(myConn, query))

    return(dt)    
}


signalQuery <- function(my_schema, machine, signal, min_ts, max_ts) {

    sql <- paste0("SELECT v.snsr_val, v.snsr_ts, v.snsr_dt, v.signal, 
                          v.qual, v.machine 
                   FROM ", my_schema, ".mytable1 AS v
                   WHERE v.machine = ?m_param
                     AND v.signal = ?s_param
                     AND v.snsr_ts BETWEEN ?min_ts_prm AND ?max_ts_prm
                   ORDER BY v.snsr_ts")

    query <- sqlInterpolate(conn, sql, m_param = machine, s_param = signal,
                            min_ts_prm = min_ts, max_ts_prm = max_ts)
    dt <- as.data.table(dbGetQuery(myConn, query))

    return(dt)    
}

Function Calls函数调用

# SINGLE SIGNAL VALUE
signal71.dates <- auxiliar.dates(config$SF_CONFIG$my_schema_name1, 984, 71)

q.Aux.final <- signalQuery(config$SF_CONFIG$my_schema_name1, 984, 71,
                           signal71.dates$min_snsr_dt, signal71.dates$max_snsr_dt)

# MULTIPLE SIGNAL VALUES
dt_list <- lapply(c(70,71), function(i) 
                    signalQuery(myschema = config$SF_CONFIG$my_schema_name1,
                                machine = 984, signal = i)
           )

signal.dates_dt <- data.table::rbindlist(dt_list)


dt_list <- lapply(1:nrow(signal.dates_dt), function(i) 
                    signalQuery(myschema = config$SF_CONFIG$my_schema_name1,
                                machine  = signal.dates_dt$machine[i], 
                                signal   = signal.dates_dt$signal[i],
                                min_ts   = signal.dates$min_snsr_dt[i],
                                max_ts   = signal.dates$max_snsr_dt[i])
           )

q.Aux.final <- data.table::rbindlist(dt_list)

Update: error solved, the connector expired I need it to connect again更新:错误已解决,连接器已过期,我需要再次连接

Really appreciate your solution.非常感谢您的解决方案。 However I am getting and error whenever as use as input two schemas.但是,每当用作输入两个模式时,我都会出错。

auxiliar.dates <- function(connection, my_schema1, my_schema2, machine, signal) { 

  sql <- paste0("SELECT t1.machine, t1.signal,  t2.signal_name, 
                          t1.min_snsr_dt, t1.max_snsr_dt,
                          t1.min_snsr_ts, t1.max_snsr_ts, 
                          t1.min_etl_dt,  t1.max_etl_dt, t1.rec_cnt
                   FROM ", my_schema1, ".table1 AS t1 
                   LEFT JOIN ", my_schema2, ".table2", " AS t2
                         ON t1.snsr_key = t2.snsr_key
                   WHERE t1.machine = ?m_param AND t1.signal = ?s_param")

  query <- sqlInterpolate(connection, sql, m_param = machine, s_param = signal)
  dt <- as.data.table(dbGetQuery(connection, query))

  return(dt)    
}`

However I get the follwing error:但是我得到以下错误:

 signal1.dates <- auxiliar.dates(myConn, config$SF_CONFIG$my_schema1, config$SF_CONFIG$my_schema2, machine.number, signal.number)
 Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘sqlInterpolate’ for signature ‘"Snowflake"’ 

Do you know why this is happening?你知道为什么会这样吗? When I try with only one input and not specifying the connection as part of the function it just works fine.当我只尝试使用一个输入而不将连接指定为函数的一部分时,它就可以正常工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM