简体   繁体   中英

Define column class when data is impoted using RJDBC in R

I am trying to import a very large data set from a HANA database in R. One of the problems of the RJDBC package is that all columns with characters are loaded as character column type. In our case loading the column as a factor would be much more efficient, since there are only a few unique values. Is it possible to define somewhere the col classes in the RJDBC call and where is the col class conversion carried out? It would be great if the conversion to factor is carried out in HANA because it will decrease the number of GB that has to be transported to R. Example code:

dbFetch(dbSendQuery(conn = hana_connection, statement = 'select CHAR_COL FROM TABLE_NAME'))

On the documantation https://www.rforge.net/RJDBC/ they are talking about DBML statements that are needed.

This really is a matter of RJDBC. Reading character values as factors is working fine (and easy to achieve) with RODBC.

ch<-odbcConnect("S12")
fact<-sqlQuery (ch, 'SELECT TOP 50 \'x\'||DIM10 as CHARCOL from FACT order by DIM10  asc') 
str(fact)
odbcClose(ch)

 str(fact)
'data.frame':   50 obs. of  1 variable:
 $ CHARCOL: Factor w/ 1 level "x0": 1 1 1 1 1 1 1 1 1 1 ...

All this aside, it's typically not the best approach to get mass data from HANA into R. Instead, all required transformations and filters are best applied before moving the data over.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM