简体   繁体   English

在python h2o模块中,如何在使用h2o.import_sql_select()导入数据时指定na_strings?

[英]In python h2o module, how to specify na_strings while using h2o.import_sql_select() to import data?

I'm trying to import data from a mysql table into a H2oFrame using h2o.import_sql_select() . 我正在尝试使用h2o.import_sql_select()将数据从mysql表导入到H2oFrame中。 I'd like NULL or empty values in VARCHAR columns in the database to be recognized as NAs when imported into the H2oFrame, but they are being considered as empty string literals. 我希望将数据库中的VARCHAR列中的NULL或空值导入到H2oFrame中时可以识别为NA,但它们被视为空字符串文字。 However, for numerical columns, NULL values are automatically recognized as NAs. 但是,对于数字列,NULL值会自动识别为NA。

Here's the code I have: 这是我的代码:

select_query = 'SELECT * FROM my_table'
train_data = h2o.import_sql_select("jdbc:mysql://localhost:3306/my_schema", select_query, "username", "password", use_temp_table=False)

train_data['my_string_column'].isna() always results in zeros even for NULL or empty values coming from the database. train_data['my_string_column'].isna()始终为零,即使来自数据库的NULL或空值也是如此。

However when I dump the data to CSV and import it using h2o.import_file('/path/to/file.csv', na_strings=['']) and then do train_data['my_string_column'].isna() , I can see that the empty values are correctly recognized as NAs because of the na_strings parameter. 但是,当我将数据转储到CSV并使用h2o.import_file('/path/to/file.csv', na_strings=[''])导入它,然后执行train_data['my_string_column'].isna() ,我可以看到由于na_strings参数,空值被正确识别为NA。

Is there some way of specifying na_strings or some other work around to achieve the expected behavior while importing data using h2o.import_sql_select() ? 在使用h2o.import_sql_select()导入数据时,是否有某种方法可以指定na_strings或其他解决方法来实现预期的行为?

Currently no such feature is implemented. 当前没有实现这种功能。 This is is simply because contrary to CSV where there is no difference between and empty string and NULL, SQL has the notion of NULL so no such feature seems necessary. 这仅仅是因为与CSV相反,在CSV和空字符串与NULL之间没有区别,SQL具有NULL的概念,因此似乎没有必要使用这种功能。

But you are saying that for string columns you are not getting any N/A values in your H2O Frame, which sounds like a bug and I will look into it. 但是您说的是,对于字符串列,您的H2O框架中没有任何N / A值,这听起来像是个错误,我将对其进行调查。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM