简体   繁体   English

如何将 .sql 文件的内容读入 R 脚本以运行查询?

[英]How to read the contents of an .sql file into an R script to run a query?

I have tried the readLines and the read.csv functions but then don't work.我已经尝试了readLinesread.csv函数,但是没有用。

Here is the contents of the my_script.sql file:这是my_script.sql文件的内容:

SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees
WHERE HireDate >= '1-july-1993'

and it is saved on my Desktop.它保存在我的桌面上。

Now I want to run this query from my R script.现在我想从我的 R 脚本运行这个查询。 Here is what I have:这是我所拥有的:

conn = connectDb()

fileName <- "C:\\Users\\me\\Desktop\\my_script.sql"
query <- readChar(fileName, file.info(fileName)$size)

query <- gsub("\r", " ", query)
query <- gsub("\n", " ", query)
query <- gsub("", " ", query)

recordSet <- dbSendQuery(conn, query)
rate <- fetch(recordSet, n = -1)

print(rate)
disconnectDb(conn)

And I am not getting anything back in this case.在这种情况下,我没有得到任何回报。 What can I try?我可以尝试什么?

I've had trouble with reading sql files myself, and have found that often times the syntax gets broken if there are any single line comments in the sql.我自己在阅读 sql 文件时遇到了麻烦,并且发现如果 sql 中有任何单行注释,通常语法会被破坏。 Since in R you store the sql statement as a single line string, if there are any double dashes in the sql it will essentially comment out any code after the double dash.由于在 R 中您将 sql 语句存储为单行字符串,因此如果 sql 中有任何双破折号,它实际上会注释掉双破折号之后的任何代码。

This is a function that I typically use whenever I am reading in a .sql file to be used in R.这是我通常在读取要在 R 中使用的 .sql 文件时使用的函数。

getSQL <- function(filepath){
  con = file(filepath, "r")
  sql.string <- ""

  while (TRUE){
    line <- readLines(con, n = 1)

    if ( length(line) == 0 ){
      break
    }

    line <- gsub("\\t", " ", line)

    if(grepl("--",line) == TRUE){
      line <- paste(sub("--","/*",line),"*/")
    }

    sql.string <- paste(sql.string, line)
  }

  close(con)
  return(sql.string)
}

I've found for queries with multiple lines, the read_file() function from the readr package works well.我发现对于多行查询,readr 包中的read_file()函数运行良好。 The only thing you have to be mindful of is to avoid single quotes (double quotes are fine).您唯一需要注意的是避免使用单引号(双引号很好)。 You can even add comments this way.您甚至可以通过这种方式添加评论。

Example query, saved as query.sql示例查询,另存为query.sql

SELECT 
COUNT(1) as "my_count"
-- comment goes here
FROM -- tabs work too
  my_table

I can then store the results in a data frame with然后我可以将结果存储在数据框中

df <- dbGetQuery(con, statement = read_file('query.sql'))

You can use the read_file() function from the readr package.您可以使用readr包中的read_file()函数。

fileName = read_file("C:/Users/me/Desktop/my_script.sql")

You will get a string variable fileName with the desired text.您将获得带有所需文本的字符串变量fileName

Note: Use / instead of \\\注意:使用/代替\\\

The answer by Matt Jewett is quite useful, but I wanted to add that I sometimes encounter the following warning when trying to read .sql files generated by sql server using that answer: Matt Jewett 的答案非常有用,但我想补充一点,在尝试使用该答案读取 sql server 生成的 .sql 文件时,有时会遇到以下警告:

Warning message: In readLines(con, n = 1) : line 1 appears to contain an embedded nul警告消息:在 readLines(con, n = 1) 中:第 1 行似乎包含嵌入的 nul

The first line returned by readLines is often " ÿþ " in these cases (ie the UTF-16 byte order mark) and subsequent lines are not read properly.在这些情况下, readLines返回的第一行通常是“ ÿþ ”(即 UTF-16 字节顺序标记),随后的行无法正确读取。 I solved this by opening the sql file in Microsoft SQL Server Management Studio and selecting我通过在Microsoft SQL Server Management Studio中打开 sql 文件并选择

File -> Save As ...文件 -> 另存为...

then on the small downarrow next to the save button selecting然后在保存按钮旁边的小向下箭头上选择

Save with Encoding ...使用编码保存...

and choosing并选择

Unicode (UTF-8 without signature) - Codepage 65001 Unicode(无签名的 UTF-8)- 代码页 65001

from the Encoding dropdown menu.从编码下拉菜单中。

If you do not have Microsoft SQL Server Management Studio and are using a Windows machine, you could also try opening the file with the default text editor and then selecting如果您没有Microsoft SQL Server Management Studio并且使用的是 Windows 计算机,您也可以尝试使用默认文本编辑器打开文件,然后选择

File -> Save As ...文件 -> 另存为...

Encoding: UTF-8编码:UTF-8

to save with a .txt file extension.以 .txt 文件扩展名保存。

Interestingly changing the file within Microsoft SQL Server Management Studio removes the BOM (byte order mark) altogether, whereas changing the file within the text editor converts the BOM to the UTF-8 BOM but nevertheless causes the query to be properly read using the referenced answer.有趣的是,在Microsoft SQL Server Management Studio中更改文件会完全删除 BOM(字节顺序标记),而在文本编辑器中更改文件会将 BOM 转换为 UTF-8 BOM,但仍然会导致使用引用的答案正确读取查询.

The combination of readr and textclean works well without having to create any new functions. readrtextclean的结合可以很好地工作,而无需创建任何新功能。 read_file() reads the file into a character vector and replace_white() ensures all escape sequence characters are removed from your .sql file. read_file()将文件读入字符向量, replace_white()确保从.sql文件中删除所有转义序列字符。 Note: Does cause problems if you have comments in your SQL string !!

library(readr)
library(textclean)

SQL <- replace_white(read_file("file_path")))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM