简体   繁体   中英

Python Create Copy of h2o frame

I am very used to the h2o framework from R, but having some trouble getting adjusted to certain aspects of h2o within python.

I know that you can create a copy of a pandas dataframe using the .copy() method, so that when you update the new dataframe you dont update the original one as well. Do h2o frames have similar functionality? What makes it even more complicated is that h2o frames seem to not behave according to function local / global environment rules.

Below is an example, and it seems that if only I could create a .copy of the frame, or have the function local environment not update my global environment it would solve my issue. If I create this same exact thing within R, then it behaves exactly as expected and doesnt actually modify the column in my original h2o frame, so how can I get python to work the same way?

##### A FUNCTION TO CHANGE THE VALUE OF A COLUMN
def test_func(train_df,
              var):

    train_df[var] = train_df[var].log()

    return(train_df)

##### TRY TO CREATE A NEW COPY OF THE FRAME WITH THE COLUMN CHANGED
new_df = test_func(train_df = old_df,
                   var = 'target')

##### THE COLUMN IN BOTH new_df AND old_df has both been changed. 

If you want to create a copy of a dataframe you can use h2o.deep_copy(data, xid) . (where xid is the string id you give for the backend H2OFrame)

if you have a dataframe df and you do

old_df = df
new_df = df

both old_df and new_df will point to the same h2oframe (df) in the backend, so any change made to old_df will be reflected in new_df.

if you want to keep changes separate you can do:

new_df = h2o.deep_copy(df, 'new_df')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM