简体   繁体   English

Cron Job with R和SQL Server

[英]Cron Job with R and SQL Server

This is probably going to be an underspecified question, as I'm not looking for a specific fix: 这可能是一个未说明的问题,因为我不是在寻找具体的修复:

I want to run a machine learning algorithm on some data in a SQL Server database. 我想对SQL Server数据库中的某些数据运行机器学习算法。 I'd like to use R to do the calculations -- which would involve using R to connect to the database, process the data, and write a table of results back to the database. 我想使用R来进行计算 - 这将涉及使用R连接到数据库,处理数据,并将结果表写回数据库。

Is this possible? 这可能吗? My guess is yes. 我的猜测是肯定的。 Shouldn't be a problem using a client... 使用客户端应该不是问题...

however, would it be possible to set this up on a linux box as a cron job? 但是,将它作为一个cron工作设置在Linux机器上是否可行?

Yes to all! 全部同意!

Your choices for scripting are either Rscript or littler as discussed in this previous post . 您在脚本编写方面的选择是Rscript或者更小的,如上一篇文章所述

Having struggled with connecting to MSSQL databases from Linux, my recommendation is to use RJDBC for database connections to MSSQL. 我一直在努力从Linux连接到MSSQL数据库,我建议使用RJDBC连接到MSSQL的数据库。 I used RODBC to connect from Windows but I was never able to get it working properly in Linux. 我使用RODBC从Windows连接,但我无法在Linux中正常工作。 To get RJDBC working you will need to have Java installed properly on your Linux box and may need to change some environment variables (seems I always have SOMETHING mis-configured with rJava). 要使RJDBC正常工作,您需要在Linux机器上正确安装Java,并且可能需要更改一些环境变量(似乎我总是使用rJava错误配置SOMETHING)。 You will also need to download and install the JDBC drivers for Linux which you can get directly from Microsoft . 您还需要下载并安装Linux的JDBC驱动程序,您可以直接从Microsoft获得。

Once you get RJDBC installed and the drivers installed, the code for pulling data from the database will look something like the following template: 一旦安装了RJDBC并安装了驱动程序,从数据库中提取数据的代码将类似于以下模板:

require(RJDBC)
drv <- JDBC("com.microsoft.sqlserver.jdbc.SQLServerDriver",
            "/etc/sqljdbc_2.0/sqljdbc4.jar")
conn <- dbConnect(drv, "jdbc:sqlserver://mySqlServer", "userId", "Password")
sqlText <- paste("
  SELECT  * 
  FROM SomeTable
       ;")
myData  <- dbGetQuery(conn, sqlText)

You can write a table with something like 你可以用类似的东西写一张桌子

dbWriteTable(conn, "myData", SomeTable, overwrite=TRUE)

When I do updates to my DB I generally use dbWriteTable() to create a temporary table on my database server then I issue a dbSendUpdate() that appends the temp table to my main table then a second dbSendUpdate() that drops the temporary table. 当我对我的数据库进行更新时,我通常使用dbWriteTable()在我的数据库服务器上创建一个临时表,然后我发出一个dbSendUpdate() ,它将临时表附加到我的主表,然后是第二个删除临时表的dbSendUpdate() You might find that pattern useful. 您可能会发现该模式很有用。

The only "gotcha" I ran into was that I could never get a Windows domain/username to work in the connection sequence. 我遇到的唯一“陷阱”是我永远无法在连接序列中使用Windows域/用户名。 I had to set up an individual SQL Server account (like sa). 我不得不设置一个单独的SQL Server帐户(如sa)。

You may just write a script containing R code and put this in the first line: 您可以编写一个包含R代码的脚本并将其放在第一行:

#!/usr/bin/env Rscript

change the file permissions to allow execution and put in crontab as it would be a bash script. 更改文件权限以允许执行并放入crontab,因为它将是一个bash脚本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM