简体   繁体   English

在R中使用RJDBC输入数据时定义列类

[英]Define column class when data is impoted using RJDBC in R

I am trying to import a very large data set from a HANA database in R. One of the problems of the RJDBC package is that all columns with characters are loaded as character column type. 我正在尝试从R中的HANA数据库导入非常大的数据集。RJDBC包的问题之一是所有带有字符的列都作为字符列类型加载。 In our case loading the column as a factor would be much more efficient, since there are only a few unique values. 在我们的案例中,将列作为因子加载会更加有效,因为只有少数几个唯一值。 Is it possible to define somewhere the col classes in the RJDBC call and where is the col class conversion carried out? 是否可以在RJDBC调用中的某个地方定义col类,以及在哪里进行col类转换? It would be great if the conversion to factor is carried out in HANA because it will decrease the number of GB that has to be transported to R. Example code: 如果在HANA中进行因子转换会很好,因为它将减少必须传输到R的GB数量。示例代码:

dbFetch(dbSendQuery(conn = hana_connection, statement = 'select CHAR_COL FROM TABLE_NAME'))

On the documantation https://www.rforge.net/RJDBC/ they are talking about DBML statements that are needed. 在文档https://www.rforge.net/RJDBC/上,他们正在谈论所需的DBML语句。

This really is a matter of RJDBC. 这确实是RJDBC的问题。 Reading character values as factors is working fine (and easy to achieve) with RODBC. 使用RODBC,将字符值作为因子读取可以很好地工作(并且易于实现)。

ch<-odbcConnect("S12")
fact<-sqlQuery (ch, 'SELECT TOP 50 \'x\'||DIM10 as CHARCOL from FACT order by DIM10  asc') 
str(fact)
odbcClose(ch)

 str(fact)
'data.frame':   50 obs. of  1 variable:
 $ CHARCOL: Factor w/ 1 level "x0": 1 1 1 1 1 1 1 1 1 1 ...

All this aside, it's typically not the best approach to get mass data from HANA into R. Instead, all required transformations and filters are best applied before moving the data over. 抛开所有这些,通常不是将大量数据从HANA导入R的最佳方法。相反,在移动数据之前最好应用所有必需的转换和过滤器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM