简体   繁体   中英

In python h2o module, how to specify na_strings while using h2o.import_sql_select() to import data?

I'm trying to import data from a mysql table into a H2oFrame using h2o.import_sql_select() . I'd like NULL or empty values in VARCHAR columns in the database to be recognized as NAs when imported into the H2oFrame, but they are being considered as empty string literals. However, for numerical columns, NULL values are automatically recognized as NAs.

Here's the code I have:

select_query = 'SELECT * FROM my_table'
train_data = h2o.import_sql_select("jdbc:mysql://localhost:3306/my_schema", select_query, "username", "password", use_temp_table=False)

train_data['my_string_column'].isna() always results in zeros even for NULL or empty values coming from the database.

However when I dump the data to CSV and import it using h2o.import_file('/path/to/file.csv', na_strings=['']) and then do train_data['my_string_column'].isna() , I can see that the empty values are correctly recognized as NAs because of the na_strings parameter.

Is there some way of specifying na_strings or some other work around to achieve the expected behavior while importing data using h2o.import_sql_select() ?

Currently no such feature is implemented. This is is simply because contrary to CSV where there is no difference between and empty string and NULL, SQL has the notion of NULL so no such feature seems necessary.

But you are saying that for string columns you are not getting any N/A values in your H2O Frame, which sounds like a bug and I will look into it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM