简体   繁体   中英

h2o frame from pandas casting

I am using h2o to perform predictive modeling from python. I have loaded some data from a csv using pandas, specifying some column types:

dtype_dict = {'SIT_SSICCOMP':'object',

columns_to_drop = ['SIT_TPFRODESI','SIT_CITTAACC',


file_completo = os.path.join(dataDir,"db4modelrisk_"+comp+".csv")
db4scoring = pd.read_csv(filepath_or_buffer=file_completo,sep=";", encoding='latin1',
                          header=0,infer_datetime_format =True,na_values=[''], keep_default_na =False,
db4scoring.drop(labels=columns_to_drop,axis=1,inplace =True)

Then, after I set up a h2o cluster I import it in h2o using db4scoring_h2o = H2OFrame(db4scoring) and I convert categorical predictors in factor for example:


When I check data types using db4scoring.dtypes I notice that they are properly set but when I import it in h2o I notice that h2oframe performs some unwanted conversions to enum (eg from float or from int). I wonder if is is a way to specify the variable format in H2OFrame.

Yes, there is. See the H2OFrame doc here: http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/frame.html#h2oframe

You just need to use the column_types argument when you cast.

Here's a short example:

# imports
import h2o
import numpy as np
import pandas as pd

# create small random pandas df
df = pd.DataFrame(np.random.randint(0,10,size=(10, 2)), 

#   A  B
#0  5  0
#1  1  3
#2  4  8
#3  3  9
# ...

# start h2o, convert pandas frame to H2OFrame
# use column_types dict to set data types
h2o_df = h2o.H2OFrame(df, column_types={'A':'numeric', 'B':'enum'})
h2o_df.describe() # you should now see the desired data types 

#       A   B
# type int enum
# ... 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM