简体   繁体   中英

python 3.x: H2OFrame crash - Parsing Pandas dataframe

I'm an early starter in H2O (python package). My problem is that I can't figure out how I can create a H2OFrame from a Pandas' dataframe succesfully.

My environment is:

  • Windows 10 Home, build 15063.540, with 16,0 GB memory
  • Java SE Development Kit 8u144 (64-bit)
  • Java SE Runtime Environment (build 1.8.0_144-b01)
  • Anaconda Python 3.5.4

I started the server with h2o.init():

H2O cluster uptime: 19 hours 14 mins
H2O cluster version:    3.14.0.1
H2O cluster version age:    12 days
H2O cluster name:   H2O_from_python_pedro_23i63g
H2O cluster total nodes:    1
H2O cluster free memory:    3.456 Gb
H2O cluster total cores:    4
H2O cluster allowed cores:  4
H2O cluster status: locked, healthy
H2O connection url: http://localhost:54321
H2O connection proxy:   None
H2O internal security:  False
H2O API Extensions: Algos, AutoML, Core V3, Core V4
Python version: 3.5.4 final

I'm trying to create my H2OFrame from the train1 pandas' dataframe through the following command:

hf1 = h2o.H2OFrame(train1)

Crash info:

OSError: Job with key $03017f00000132d4ffffffff$_8ef7ebc5204725b046d7b31ca7194c71 failed with an exception: DistributedException from /127.0.0.1:54321: 'null', caused by java.lang.AssertionError
stacktrace: 
DistributedException from /127.0.0.1:54321: 'null', caused by java.lang.AssertionError
    at water.MRTask.getResult(MRTask.java:478)
    at water.MRTask.getResult(MRTask.java:486)
    at water.MRTask.doAll(MRTask.java:402)
    at water.parser.ParseDataset.parseAllKeys(ParseDataset.java:245)
    at water.parser.ParseDataset.access$000(ParseDataset.java:26)
    at water.parser.ParseDataset$ParserFJTask.compute2(ParseDataset.java:194)
    at water.H2O$H2OCountedCompleter.compute(H2O.java:1255)
    at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
    at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
    at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
    at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
    at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
Caused by: java.lang.AssertionError
    at water.parser.Categorical.addKey(Categorical.java:41)
    at water.parser.FVecParseWriter.addStrCol(FVecParseWriter.java:133)
    at water.parser.CsvParser.parseChunk(CsvParser.java:126)
    at water.parser.ParseDataset$MultiFileParseTask$DistributedParse.map(ParseDataset.java:888)
    at water.MRTask.compute2(MRTask.java:637)
    at water.MRTask.compute2(MRTask.java:591)
    at water.MRTask.compute2(MRTask.java:591)
    at water.MRTask.compute2(MRTask.java:591)
    at water.MRTask.compute2(MRTask.java:591)
    at water.MRTask.compute2(MRTask.java:591)
    at water.MRTask.compute2(MRTask.java:591)
    at water.H2O$H2OCountedCompleter.compute1(H2O.java:1258)
    at water.parser.ParseDataset$MultiFileParseTask$DistributedParse$Icer.compute1(ParseDataset$MultiFileParseTask$DistributedParse$Icer.java)
    at water.H2O$H2OCountedCompleter.compute(H2O.java:1254)
    ... 5 more

However , I tried to create the H2OFrame by using the first 6 rows of the pandas' dataframe and it went well.

hf1 = h2o.H2OFrame(train1.loc[:6,:])
[out] Parse progress: |█████████████████████████████████████████████████████████| 100%

But when I try more than these 6 rows (eg 7 rows ), it fails again with the previous crash info:

hf1 = h2o.H2OFrame(train1.loc[:7,:])
[out] Parse progress: | (failed)

What can be wrong in this?

Thanks in advance.

Pedro

Added:

h2o.cluster().shutdown()

and fixed my problem. May be wrong, but I think lack of RAM was my issue, so the shutdown of the previous cluster helped.

Figured out the solution to my own problem but I'll leave this here so I can help the rest of the people that can have the same issue.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM