Wrestling the whole day with this issue:
I Want to access data on Hadoop (through Hive). And installed the ODBC package.
I'm able to make connection with the server:
con <- dbConnect(odbc:: odbc(), "hadoop")
And I am able to see the table that I want to get in R:
dblistTables(con, schema= "aacs")
The output is:
[1] "dev_1" "dev_2"
[3] "dev_3" "dev_4"
I want to have "dev_4" (in a data frame) in my R environment. I tried:
db_orders <- tbl(con, "dev_4")
But I got an error: Table or view not found. Also the next line lead to nothing.
db_orders <- tbl(con, "aacs.dev_4")
How can I get that data table in my R environment?
EDIT 1
Tried to run the next two things:
result <- dbSendQuery(con, "SELECT * FROM aacs.dev_4")
Received an error: No Space left on device.
Ok, so let's reduce the query then:
result <- dbSendQuery(con, "SELECT * FROM aacs.dev_4 LIMIT 100")
But again, the same error:
Error: <SQL> 'SELECT * FROM aacs.dev_4 limit 100'
nanodbc/nanodbc.cpp:1587: HY000: [Hortonworks][Hardy] (35) Error from server: error code: '2' error message: 'Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 2, vertexId=vertex_15177720341_0081_2_08, diagnostics=[Task failed, taskId=task_15177723341_0081_2_08_000146, diagnostics=[TaskAttempt 0 failed, info=[Error: FS Error in Child JVM:org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:261)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.tez.runtime.library.common.sort.impl.IFileOutputStream.write(IFil
Anyone an idea how to solve this? It is strange that there is no memory left.. because I have a lot of space (enough to store the data!).
EDIT 2
As @Florian suggested:
data <- dbReadTable(con, "aacs.dev_4")
Led to the next error:
Error: <SQL> 'SELECT * FROM `aacs.dev_4`'
nanodbc/nanodbc.cpp:1587: HY000: [Hortonworks][Hardy] (35) Error from server: error code: '2' error message: 'Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 2, vertexId=vertex_1517772023341_0082_1_08, diagnostics=[Task failed, taskId=task_1517772023341_0082_1_08_000236, diagnostics=[TaskAttempt 0 failed, info=[Error: exceptionThrown=org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: error in shuffle in fetcher {Map_4} #10
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:360)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java
Try
x <- dbReadTable(con, "dev_4")
Full working example:
library(DBI)
library(RSQLite)
con <- dbConnect(RSQLite::SQLite(), ":memory:")
dbListTables(con)
dbWriteTable(con, "mtcars", mtcars)
x <- dbReadTable(con, "mtcars")
Hope this helps!
I know what went wrong...
There is an error in the View ("dev_4"). Because when I ran the code:
test <- dbReadTable(con, "dev_3")
I got the data frame returned. Time to connect with the Data engineer... thanks for you help!
我想你在找
data<- dbGetQuery(con,"SELECT * FROM aacs.dev_4 LIMIT 100")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.