[英]Creating and using temporary/volatile database tables In Stata
Is there a way to tweak Stata to work with temporary volatile tables? 有没有办法调整Stata使用临时易变表? These tables and the data are deleted after a user logs off the session. 用户注销会话后,将删除这些表和数据。
Here's an example of a simple toy SQL query that I am using in Stata and Teradata: 这是我在Stata和Teradata中使用的简单玩具 SQL查询的示例:
odbc load, exec("
BEGIN TRANSACTION;
CREATE VOLATILE MULTISET TABLE vol_tab AS (
SELECT TOP 10 user_id
FROM dw_users
) WITH DATA
PRIMARY INDEX(user_id)
ON COMMIT PRESERVE ROWS;
SELECT * FROM vol_tab;
END TRANSACTION;
") dsn("mozart");
This is the error message I am getting: 这是我收到的错误消息:
The ODBC driver reported the following diagnostics
[Teradata][ODBC Teradata Driver][Teradata Database] Only an ET or null statement is legal after a DDL Statement.
SQLSTATE=25000
r(682);
The Stata error code means: Stata错误代码表示:
error . 错误。 . 。 . 。 . 。 . 。 . 。 . 。 . 。 . 。 . 。 . 。 . 。 . 。 . 。 . 。 . 。 . 。 . 。 . 。 . 。 . 。 . 。 . 。 . 。 Return code 682 could not connect to odbc dsn; 返回代码682无法连接到odbc dsn; This typically occurs because of incorrect permissions, such as a bad User Name or Password. 这通常是由于不正确的权限(例如错误的用户名或密码)而发生的。 Use set debug on to display the actual error message generated by the ODBC driver. 使用set debug on显示ODBC驱动程序生成的实际错误消息。
As far as I can tell permission are fine since I can pull data if I just execute the "SELECT TOP 10..." query. 据我所知,只要我执行“SELECT TOP 10 ...”查询,我可以提取数据。 I set debug on, but it did not produce any additional information. 我设置了调试,但它没有产生任何额外的信息。
Session mode is Teradata. 会话模式是Teradata。 ODBC manager is set to unixODBC. ODBC管理器设置为unixODBC。 I am using Stata 13.1 on an Ubuntu server. 我在Ubuntu服务器上使用Stata 13.1。
I believe the underlying issue may be that separate connections are established for each SQL statement , so the volatile table evaporates by the time the select is issued. 我认为潜在的问题可能是为每个SQL语句建立了单独的连接 ,因此volatile表在select发出时会蒸发。 I am waiting on tech support to verify this. 我正在等待技术支持来验证这一点。
I tried using the odbc sqlfile
command well, but this approach does not work unless I create a permanent table at the end of it. 我尝试使用odbc sqlfile
命令,但这种方法不起作用,除非我在它的末尾创建一个永久表。 There's no load option with odbc sqlfile
. odbc sqlfile
没有加载选项。
Volatile tables seem to work just fine in SAS and R. For example, this works perfectly: 易失性表似乎在SAS和R中运行良好。例如,这非常有效:
library("RODBC")
db <- odbcConnect("mozart")
sqlQuery(db,"CREATE VOLATILE MULTISET TABLE vol_tab AS (
SELECT TOP 10 user_id
FROM dw_users
) WITH DATA
PRIMARY INDEX(user_id)
ON COMMIT PRESERVE ROWS;
")
data<- sqlQuery(db,"select * from vol_tab;",rows_at_time=1)
Perhaps this is because the connection to the DB remains open until close(db)
. 也许这是因为与DB的连接保持打开直到close(db)
。
I'm not familiar with Stata, but I'm guessing that your ODBC is connecting in "ANSI" mode. 我不熟悉Stata,但我猜你的ODBC是以“ANSI”模式连接的。 Try adding this between the create volatile table
and the select
statements: 尝试在create volatile table
和select
语句之间添加:
commit work;
If that doesn't work, you may need to make two separate calls somehow. 如果这不起作用,您可能需要以某种方式进行两次单独的调用。
UPDATE: Thinking a bit more about this, perhaps you can try this: 更新:想一想这个,也许你可以试试这个:
odbc load, exec("select distinct user_id from dw_users where cast(date_confirm as
date) > '2011-09-15'") clear dsn("mozart") lowercase;
In other words, just execute the query in one step; 换句话说,只需一步执行查询; don't try to create a volatile table. 不要尝试创建易失性表。
What if you try the following with your connection mode as TERADATA (which is more often then not the default): 如果您尝试以下连接模式作为TERADATA(通常不是默认设置),该怎么办?
odbc load, exec("BT; create volatile table new_usr as
(select top 10 user_id from dw_users) with data primary index(user_id) on commit
preserve rows;
ET;
select * from new_usr;") clear dsn("mozart") lowercase;
The BT;
BT;
and ET;
和ET;
statements wrap the SQL contained between in an explicit transaction. 语句包含显式事务中包含的SQL。 This SQL has been tested in SQL Assistant as I don't have access to the tool you are using. 此SQL已在SQL Assistant中进行了测试,因为我无法访问您正在使用的工具。 Typically, BT
and ET
are used to enforce logical transactions (or units of work) that must be completed successfully or everything is rolled back. 通常, BT
和ET
用于强制必须成功完成或回滚所有内容的逻辑事务(或工作单元)。 This may allow you to get around the issue you are having in your tool. 这可以让您解决您在工具中遇到的问题。
EDIT 编辑
Failing the ability to wrap the Volatile Table creation in a BT and ET do you have the ability to create a stored procedure or macro that can embed all the logic necessary to complete the task then call the stored procedure or macro from Stata? 如果无法将易失性表创建包装在BT和ET中,您是否能够创建存储过程或宏,可以嵌入完成任务所需的所有逻辑,然后从Stata调用存储过程或宏?
This answer is not longer correct. 这个答案不再正确。 Stata now allows multiple SQL statements as long as the multistatement
option is added to the odbc
command. 只要将multistatement
语句选项添加到odbc
命令,Stata现在允许多个SQL语句。
Stata's odbc
command does not allow combining multiple SQL statements into a single odbc
command and altering TD's mode. Stata的odbc
命令不允许将多个SQL语句组合成单个odbc
命令并改变TD的模式。 It also creates a separate connection for each odbc
command issued, so the volatile table goes poof by the time you want to use it to do something. 它还为发出的每个odbc
命令创建一个单独的连接,因此当您想要使用它来执行某些操作时,volatile表会变得很糟糕。 This makes it impossible to use volatile tables directly. 这使得无法直接使用易变表。
However, there is a way to use R through Stata to produce a Stata data file. 但是,有一种方法可以使用R到Stata生成Stata数据文件。 You need to install rsource
from SSC and the foreign
and RODBC
packages in R. The 2 globals Rterm_path and Rterm_options for rsource
can be defined in sysprofile.ado or in your own profile.ado. 您需要安装rsource
从SSC和foreign
和RODBC
在河包2个全局Rterm_path和Rterm_options为rsource
可以sysprofile.ado或在自己的profile.ado定义。 As far as I can determine, R does not allow exporting timestamps, so I had to do some conversion of dates and timestamps by hand. 据我所知,R不允许导出时间戳,所以我不得不手动进行日期和时间戳的转换。 These conversions are somewhat at odds with the suggestions in the Stata manuals and the Stata blog . 这些转换与Stata手册和Stata博客中的建议有些不一致。
rsource, terminator(END_OF_R)
library("RODBC")
library("foreign")
db <- odbcConnect("mydsn")
sqlQuery(db,"CREATE VOLATILE MULTISET TABLE vol_tab AS (SELECT ...) WITH DATA PRIMARY INDEX(...) ON COMMIT PRESERVE ROWS;")
data<- sqlQuery(db,"SELECT * FROM vol_tab;",rows_at_time=1)
write.dta(data,"mydata.dta",convert.dates = FALSE)
close(db)
END_OF_R
use "mydata.dta", replace
/* convert dates and timestamps to Stata format */
gen stata_date = rdate + td(01jan1970)
format stata_date %td
gen double stata_timestamp = (rtimestamp + 315594000)*1000
format stata_timestamp %tc
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.