简体   繁体   中英

Preparing character strings for sql in () query within dbGetQuery

I have written the below query to study some sample means for my analysis and this work fine and I do not require to mug up this code. However I need to comprehend the use of "",; sep="" "",; sep="" along with limit statement in dbGetQuery.

df <- data.frame(Sample.Num = integer(),
                 Sample.Mean = integer(),
                 quant.01 = integer(),
                 quant.05 = integer(),
                 quant.10 = integer(),
                 quant.25 = integer(),
                 quant.50 = integer(),
                 quant.75 = integer(),
                 stringsAsFactors = FALSE)
df[1,] <- NA
for (i in 1:500){
  sdf <- dbGetQuery(con,"select col11,col23,col30
                    from schema.db.name
                    where col1 in ('value1')
                    and col2 in ('(v3) - value3')
                    and col3 in ('v0123 - value4')
                    order by random()*600000 limit 100")
  meansample <- mean(sdf$mileage,na.rm = TRUE)
  quant.01 <- quantile(sdf$mileage,na.rm = TRUE,probs = .01)
  quant.05 <- quantile(sdf$mileage,na.rm = TRUE,probs = .05)
  quant.10 <- quantile(sdf$mileage,na.rm = TRUE,probs = .10)
  quant.25 <- quantile(sdf$mileage,na.rm = TRUE,probs = .25)
  quant.50 <- quantile(sdf$mileage,na.rm = TRUE,probs = .50)
  quant.75 <- quantile(sdf$mileage,na.rm = TRUE,probs = .75)
dbDisconnect(con)

Problem:

Col1,2 & 3 have character values. Now I need to check the same for all 7 values of col2 and 9 values of col3. Whenever I use any specific values of col1,2,3 I need to store them in a data frame to perform some operations before the loop ends and restarts with second value.

Hence this calls for studying: Add a dynamic value into RMySQL getQuery

I did study gsub and shQuote as well. I am unable to conceptualize the use of '", df[i], "';", sep = "" despite of several combinations along with limit statement. Had it been a local database I could have used dplyr package to run the loop. I understand seq_along() requires a numeric value and in order to pass through the loop which sets right 'correct value' inside in () query. I did try paste0 and paste arguments along with collapse="," arguements but did not help. I also tried lapply from below post but it does not work inside in () query.

How to do dbGetQuery for loop in R

(v3) - value3, (v1) - value1 etc are the actual elements in col. This is combination of code and description of the attribute col2, col3 etc.

Let's assume that you have a query and want to substitute some range values in it.

sql <- "select col11,col23,col30
                    from schema.db.name
                    where col1 in (%s)
                    and col2 in (%s)
                    and col3 in (%s)
                    order by random()*600000 limit 100"

Here I will use sprintf to do the substitutions in the query, so I have left markers ( %s ) in the string. So let's setup 3 ranges that we want to test for:

col1 <- 1:10
col2 <- c('a', 'f', 'z')
col3 <- c('name1', 'name2')

# create strings that are valid in SQL
col1_sql <- paste(col1, collapse = ',')
col2_sql <- paste0("'", col2, "'", collapse = ',')  # put quotes on strings
col3_sql <- paste0("'", col3, "'", collapse = ',')

# now substitute back in query
sql_new <- sprintf(sql,
                   col1_sql,  # strings we just constructed
                   col2_sql,
                   col3_sql
)

# print out the query
cat(sql_new)

Here is what the query will be:

select col11,col23,col30
    from schema.db.name
    where col1 in (1,2,3,4,5,6,7,8,9,10)
    and col2 in ('a','f','z')
    and col3 in ('name1','name2')
    order by random()*600000 limit 100

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM