简体   繁体   English

ODBC连接,在R中获取一个表

[英]ODBC connect, get a table in R

Wrestling the whole day with this issue: 用这个问题摔跤了一整天:

I Want to access data on Hadoop (through Hive). 我想访问Hadoop上的数据(通过Hive)。 And installed the ODBC package. 并安装了ODBC包。

I'm able to make connection with the server: 我能够与服务器建立连接:

con <- dbConnect(odbc:: odbc(), "hadoop") 

And I am able to see the table that I want to get in R: 我能够看到我想要进入R的表格:

dblistTables(con, schema= "aacs") 

The output is: 输出是:

   [1] "dev_1"                  "dev_2"     
   [3] "dev_3"                  "dev_4"

I want to have "dev_4" (in a data frame) in my R environment. 我希望在我的R环境中有“dev_4”(在数据框中)。 I tried: 我试过了:

db_orders <- tbl(con, "dev_4")

But I got an error: Table or view not found. 但我收到一个错误:找不到表或视图。 Also the next line lead to nothing. 下一行也没有任何结果。

db_orders <- tbl(con, "aacs.dev_4")

How can I get that data table in my R environment? 如何在R环境中获取该数据表?

EDIT 1 编辑1

Tried to run the next two things: 试图运行接下来的两件事:

result <- dbSendQuery(con, "SELECT * FROM aacs.dev_4")

Received an error: No Space left on device. 收到错误:设备上没有剩余空间。

Ok, so let's reduce the query then: 好的,那么让我们减少查询:

result <- dbSendQuery(con, "SELECT * FROM aacs.dev_4 LIMIT 100")

But again, the same error: 但同样的错误:

Error: <SQL> 'SELECT * FROM aacs.dev_4 limit 100'
  nanodbc/nanodbc.cpp:1587: HY000: [Hortonworks][Hardy] (35) Error from server: error code: '2' error message: 'Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 2, vertexId=vertex_15177720341_0081_2_08, diagnostics=[Task failed, taskId=task_15177723341_0081_2_08_000146, diagnostics=[TaskAttempt 0 failed, info=[Error: FS Error in Child JVM:org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
    at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:261)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
    at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
    at java.io.DataOutputStream.write(DataOutputStream.java:107)
    at org.apache.tez.runtime.library.common.sort.impl.IFileOutputStream.write(IFil

Anyone an idea how to solve this? 任何人都知道如何解决这个问题? It is strange that there is no memory left.. because I have a lot of space (enough to store the data!). 奇怪的是没有内存......因为我有足够的空间(足以存储数据!)。

EDIT 2 编辑2

As @Florian suggested: 正如@Florian所说:

data <- dbReadTable(con, "aacs.dev_4") 

Led to the next error: 导致下一个错误:

Error: <SQL> 'SELECT * FROM `aacs.dev_4`'
  nanodbc/nanodbc.cpp:1587: HY000: [Hortonworks][Hardy] (35) Error from server: error code: '2' error message: 'Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 2, vertexId=vertex_1517772023341_0082_1_08, diagnostics=[Task failed, taskId=task_1517772023341_0082_1_08_000236, diagnostics=[TaskAttempt 0 failed, info=[Error: exceptionThrown=org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: error in shuffle in fetcher {Map_4} #10
    at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:360)
    at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337)
    at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
    at java.util.concurrent.FutureTask.run(FutureTask.java

Try 尝试

x <- dbReadTable(con, "dev_4")

Full working example: 完整的工作示例:

library(DBI)
library(RSQLite)

con <- dbConnect(RSQLite::SQLite(), ":memory:")

dbListTables(con)
dbWriteTable(con, "mtcars", mtcars)
x <- dbReadTable(con, "mtcars")

Hope this helps! 希望这可以帮助!

I know what went wrong... 我知道出了什么问题......

There is an error in the View ("dev_4"). 视图中存在错误(“dev_4”)。 Because when I ran the code: 因为当我运行代码时:

test <- dbReadTable(con, "dev_3") 

I got the data frame returned. 我得到了返回的数据框。 Time to connect with the Data engineer... thanks for you help! 是时候与数据工程师联系了......谢谢你的帮助!

我想你在找

data<- dbGetQuery(con,"SELECT * FROM aacs.dev_4 LIMIT 100")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM