简体   繁体   English

在Stata中创建和使用临时/易变数据库表

[英]Creating and using temporary/volatile database tables In Stata

Addendum: As of Stata 14, volatile tables work without any hacks. 附录:从Stata 14开始,挥发性表格没有任何破解。

Is there a way to tweak Stata to work with temporary volatile tables? 有没有办法调整Stata使用临时易变表? These tables and the data are deleted after a user logs off the session. 用户注销会话后,将删除这些表和数据。

Here's an example of a simple toy SQL query that I am using in Stata and Teradata: 这是我在Stata和Teradata中使用的简单玩具 SQL查询的示例:

odbc load,  exec("
    BEGIN TRANSACTION;
    CREATE VOLATILE MULTISET TABLE vol_tab AS (
        SELECT TOP 10 user_id
        FROM dw_users
    ) WITH DATA
    PRIMARY INDEX(user_id)
    ON COMMIT PRESERVE ROWS;

    SELECT * FROM vol_tab;
    END TRANSACTION;
") dsn("mozart");

This is the error message I am getting: 这是我收到的错误消息:

The ODBC driver reported the following diagnostics
[Teradata][ODBC Teradata Driver][Teradata Database] Only an ET or null statement is legal after a DDL Statement.
SQLSTATE=25000
r(682);

The Stata error code means: Stata错误代码表示:

error . 错误。 . . . . . . . . . . . . . . . . . . . . . . . Return code 682 could not connect to odbc dsn; 返回代码682无法连接到odbc dsn; This typically occurs because of incorrect permissions, such as a bad User Name or Password. 这通常是由于不正确的权限(例如错误的用户名或密码)而发生的。 Use set debug on to display the actual error message generated by the ODBC driver. 使用set debug on显示ODBC驱动程序生成的实际错误消息。

As far as I can tell permission are fine since I can pull data if I just execute the "SELECT TOP 10..." query. 据我所知,只要我执行“SELECT TOP 10 ...”查询,我可以提取数据。 I set debug on, but it did not produce any additional information. 我设置了调试,但它没有产生任何额外的信息。

Session mode is Teradata. 会话模式是Teradata。 ODBC manager is set to unixODBC. ODBC管理器设置为unixODBC。 I am using Stata 13.1 on an Ubuntu server. 我在Ubuntu服务器上使用Stata 13.1。

I believe the underlying issue may be that separate connections are established for each SQL statement , so the volatile table evaporates by the time the select is issued. 我认为潜在的问题可能是为每个SQL语句建立了单独的连接 ,因此volatile表在select发出时会蒸发。 I am waiting on tech support to verify this. 我正在等待技术支持来验证这一点。

I tried using the odbc sqlfile command well, but this approach does not work unless I create a permanent table at the end of it. 我尝试使用odbc sqlfile命令,但这种方法不起作用,除非我在它的末尾创建一个永久表。 There's no load option with odbc sqlfile . odbc sqlfile没有加载选项。

Volatile tables seem to work just fine in SAS and R. For example, this works perfectly: 易失性表似乎在SAS和R中运行良好。例如,这非常有效:

library("RODBC")
db <- odbcConnect("mozart")
sqlQuery(db,"CREATE VOLATILE MULTISET TABLE vol_tab AS (
         SELECT TOP 10 user_id
         FROM dw_users
     ) WITH DATA
     PRIMARY INDEX(user_id)
     ON COMMIT PRESERVE ROWS;
")
data<- sqlQuery(db,"select * from vol_tab;",rows_at_time=1)

Perhaps this is because the connection to the DB remains open until close(db) . 也许这是因为与DB的连接保持打开直到close(db)

I'm not familiar with Stata, but I'm guessing that your ODBC is connecting in "ANSI" mode. 我不熟悉Stata,但我猜你的ODBC是以“ANSI”模式连接的。 Try adding this between the create volatile table and the select statements: 尝试在create volatile tableselect语句之间添加:

commit work;

If that doesn't work, you may need to make two separate calls somehow. 如果这不起作用,您可能需要以某种方式进行两次单独的调用。

UPDATE: Thinking a bit more about this, perhaps you can try this: 更新:想一想这个,也许你可以试试这个:

odbc load, exec("select distinct user_id from dw_users where cast(date_confirm as
date) > '2011-09-15'") clear dsn("mozart") lowercase;

In other words, just execute the query in one step; 换句话说,只需一步执行查询; don't try to create a volatile table. 不要尝试创建易失性表。

What if you try the following with your connection mode as TERADATA (which is more often then not the default): 如果您尝试以下连接模式作为TERADATA(通常不是默认设置),该怎么办?

odbc load, exec("BT; create volatile table new_usr as
(select top 10 user_id from dw_users) with data primary index(user_id) on commit
preserve rows; 
ET;

select * from new_usr;") clear dsn("mozart") lowercase;

The BT; BT; and ET; ET; statements wrap the SQL contained between in an explicit transaction. 语句包含显式事务中包含的SQL。 This SQL has been tested in SQL Assistant as I don't have access to the tool you are using. 此SQL已在SQL Assistant中进行了测试,因为我无法访问您正在使用的工具。 Typically, BT and ET are used to enforce logical transactions (or units of work) that must be completed successfully or everything is rolled back. 通常, BTET用于强制必须成功完成或回滚所有内容的逻辑事务(或工作单元)。 This may allow you to get around the issue you are having in your tool. 这可以让您解决您在工具中遇到的问题。

EDIT 编辑

Failing the ability to wrap the Volatile Table creation in a BT and ET do you have the ability to create a stored procedure or macro that can embed all the logic necessary to complete the task then call the stored procedure or macro from Stata? 如果无法将易失性表创建包装在BT和ET中,您是否能够创建存储过程或宏,可以嵌入完成任务所需的所有逻辑,然后从Stata调用存储过程或宏?

Put

BT; BT; --UR LOGIC-- ET; --UR LOGIC-- ET;

IF any thing fails in between.it rolls back 如果介于两者之间的任何事情失败了。它会回滚

got from here 这里来了

This answer is not longer correct. 这个答案不再正确。 Stata now allows multiple SQL statements as long as the multistatement option is added to the odbc command. 只要将multistatement语句选项添加到odbc命令,Stata现在允许多个SQL语句。


Stata's odbc command does not allow combining multiple SQL statements into a single odbc command and altering TD's mode. Stata的odbc命令不允许将多个SQL语句组合成单个odbc命令并改变TD的模式。 It also creates a separate connection for each odbc command issued, so the volatile table goes poof by the time you want to use it to do something. 它还为发出的每个odbc命令创建一个单独的连接,因此当您想要使用它来执行某些操作时,volatile表会变得很糟糕。 This makes it impossible to use volatile tables directly. 这使得无法直接使用易变表。

However, there is a way to use R through Stata to produce a Stata data file. 但是,有一种方法可以使用R到Stata生成Stata数据文件。 You need to install rsource from SSC and the foreign and RODBC packages in R. The 2 globals Rterm_path and Rterm_options for rsource can be defined in sysprofile.ado or in your own profile.ado. 您需要安装rsource从SSC和foreignRODBC在河包2个全局Rterm_path和Rterm_options为rsource可以sysprofile.ado或在自己的profile.ado定义。 As far as I can determine, R does not allow exporting timestamps, so I had to do some conversion of dates and timestamps by hand. 据我所知,R不允许导出时间戳,所以我不得不手动进行日期和时间戳的转换。 These conversions are somewhat at odds with the suggestions in the Stata manuals and the Stata blog . 这些转换与Stata手册Stata博客中的建议有些不一致。

rsource, terminator(END_OF_R)
  library("RODBC")
  library("foreign")
  db <- odbcConnect("mydsn")
  sqlQuery(db,"CREATE VOLATILE MULTISET TABLE vol_tab AS (SELECT ...) WITH DATA PRIMARY INDEX(...) ON COMMIT PRESERVE ROWS;")
  data<- sqlQuery(db,"SELECT * FROM vol_tab;",rows_at_time=1)
  write.dta(data,"mydata.dta",convert.dates = FALSE)
  close(db)
END_OF_R

use "mydata.dta", replace
/* convert dates and timestamps to Stata format */
gen stata_date = rdate + td(01jan1970)
format stata_date %td
gen double stata_timestamp = (rtimestamp + 315594000)*1000
format stata_timestamp %tc

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM