How do I prefix an existing column in an h2o data frame with a string value in python? The column is numerical to begin with. I have been able to do this in the R H2O but I seem to struggle or can't get this right in the python version of h2o.
In R this seems to work.
h2o.init()
df = as.h2o(mtcars)
df['mpg']=h2o.ascharacter(df['mpg'])
df['mpg']=h2o.sub('','hey--------',df['mpg'])
df
However, when I try to do this in python I get a variety of errors. Sometimes I'm able to adjust the numerical column to a string without an error but then when I go and look at the data frame I receive an error. I'll post the code if needed. Given that they are the same functions I imagine it should be relatively easy but I must be missing something.
EDITED (didn't answer original question the first time, answering it now) This is how you would convert an numerical column to a column with string values and then replace those values.
import h2o
prostate = "http://h2o-public-test-data.s3.amazonaws.com/smalldata/prostate/prostate.csv"
h2o.init()
df = h2o.import_file(prostate)
# creating your example column with all values equal to 23
df['mpg'] = 23
df['mpg'] = df['mpg'].ascharacter()
df[1,'mpg'] # see that it is now a string
df['mpg']=df['mpg'].sub('23', 'please-help-me----23')
df
Out[16]: ID CAPSULE AGE RACE DPROS DCAPS PSA VOL GLEASON mpg
---- --------- ----- ------ ------- ------- ----- ----- --------- --------------------
1 0 65 1 2 1 1.4 0 6 please-help-me----23
2 0 72 1 3 2 6.7 0 7 please-help-me----23
3 0 70 1 1 2 4.9 0 6 please-help-me----23
4 0 76 2 2 1 51.2 20 7 please-help-me----23
5 0 69 1 1 1 12.3 55.9 6 please-help-me----23
6 1 71 1 3 2 3.3 0 8 please-help-me----23
7 0 68 2 4 2 31.9 0 7 please-help-me----23
8 0 61 2 4 2 66.7 27.2 7 please-help-me----23
9 0 69 1 1 1 3.9 24 7 please-help-me----23
10 0 68 2 1 2 13 0 6 please-help-me----23
[380 rows x 10 columns]
(answering the wrong question below:) you have to pass a new list of column names (the same length as your original column list).
df.columns = new_column_list
for example I can rename the columns ID
with NEW
:
import h2o
prostate = "http://h2o-public-test-data.s3.amazonaws.com/smalldata/prostate/prostate.csv"
h2o.init()
df = h2o.import_file(prostate)
print(df.columns)
columns[0] = 'NEW'
df.columns = columns
print(df.columns)
which will show:
Checking whether there is an H2O instance running at http://localhost:54321. connected.
-------------------------- ------------------------------
H2O cluster uptime: 9 hours 31 mins
H2O cluster version: 3.10.4.8
H2O cluster version age: 1 month and 6 days
H2O cluster name: H2O_from_python_laurend_tzhifp
H2O cluster total nodes: 1
H2O cluster free memory: 3.276 Gb
H2O cluster total cores: 8
H2O cluster allowed cores: 8
H2O cluster status: locked, healthy
H2O connection url: http://localhost:54321
H2O connection proxy:
H2O internal security: False
Python version: 3.5.1 final
-------------------------- ------------------------------
Parse progress: |████████████████████████████████████████████████████████████████████████████| 100%
['ID', 'CAPSULE', 'AGE', 'RACE', 'DPROS', 'DCAPS', 'PSA', 'VOL', 'GLEASON']
['NEW', 'CAPSULE', 'AGE', 'RACE', 'DPROS', 'DCAPS', 'PSA', 'VOL', 'GLEASON']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.