简体   繁体   中英

h2o python prefix an existing column in an h2o data frame with a string

How do I prefix an existing column in an h2o data frame with a string value in python? The column is numerical to begin with. I have been able to do this in the R H2O but I seem to struggle or can't get this right in the python version of h2o.

In R this seems to work.

h2o.init()
df = as.h2o(mtcars)
df['mpg']=h2o.ascharacter(df['mpg'])
df['mpg']=h2o.sub('','hey--------',df['mpg'])
df

However, when I try to do this in python I get a variety of errors. Sometimes I'm able to adjust the numerical column to a string without an error but then when I go and look at the data frame I receive an error. I'll post the code if needed. Given that they are the same functions I imagine it should be relatively easy but I must be missing something.

EDITED (didn't answer original question the first time, answering it now) This is how you would convert an numerical column to a column with string values and then replace those values.

import h2o
prostate = "http://h2o-public-test-data.s3.amazonaws.com/smalldata/prostate/prostate.csv"
h2o.init()
df = h2o.import_file(prostate)
# creating your example column with all values equal to 23
df['mpg'] = 23
df['mpg'] = df['mpg'].ascharacter()
df[1,'mpg'] # see that it is now a string
df['mpg']=df['mpg'].sub('23',  'please-help-me----23')
df
Out[16]:   ID    CAPSULE    AGE    RACE    DPROS    DCAPS    PSA    VOL    GLEASON  mpg
----  ---------  -----  ------  -------  -------  -----  -----  ---------  --------------------
   1          0     65       1        2        1    1.4    0            6  please-help-me----23
   2          0     72       1        3        2    6.7    0            7  please-help-me----23
   3          0     70       1        1        2    4.9    0            6  please-help-me----23
   4          0     76       2        2        1   51.2   20            7  please-help-me----23
   5          0     69       1        1        1   12.3   55.9          6  please-help-me----23
   6          1     71       1        3        2    3.3    0            8  please-help-me----23
   7          0     68       2        4        2   31.9    0            7  please-help-me----23
   8          0     61       2        4        2   66.7   27.2          7  please-help-me----23
   9          0     69       1        1        1    3.9   24            7  please-help-me----23
  10          0     68       2        1        2   13      0            6  please-help-me----23

[380 rows x 10 columns]

(answering the wrong question below:) you have to pass a new list of column names (the same length as your original column list).

df.columns = new_column_list

for example I can rename the columns ID with NEW :

import h2o
prostate = "http://h2o-public-test-data.s3.amazonaws.com/smalldata/prostate/prostate.csv"
h2o.init()
df = h2o.import_file(prostate)
print(df.columns)
columns[0] = 'NEW'
df.columns = columns
print(df.columns)

which will show:

Checking whether there is an H2O instance running at http://localhost:54321. connected.
--------------------------  ------------------------------
H2O cluster uptime:         9 hours 31 mins
H2O cluster version:        3.10.4.8
H2O cluster version age:    1 month and 6 days
H2O cluster name:           H2O_from_python_laurend_tzhifp
H2O cluster total nodes:    1
H2O cluster free memory:    3.276 Gb
H2O cluster total cores:    8
H2O cluster allowed cores:  8
H2O cluster status:         locked, healthy
H2O connection url:         http://localhost:54321
H2O connection proxy:
H2O internal security:      False
Python version:             3.5.1 final
--------------------------  ------------------------------
Parse progress: |████████████████████████████████████████████████████████████████████████████| 100%
['ID', 'CAPSULE', 'AGE', 'RACE', 'DPROS', 'DCAPS', 'PSA', 'VOL', 'GLEASON']
['NEW', 'CAPSULE', 'AGE', 'RACE', 'DPROS', 'DCAPS', 'PSA', 'VOL', 'GLEASON']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM