简体   繁体   English

Flink:错误解析数值格式

[英]Flink: ERROR parse numeric value format

I'm trying to develop a K-means model in Flink (Scala), using Zeppelin. 我正在尝试使用Zeppelin在Flink(Scala)中开发K-均值模型。 This is part of my simple code: 这是我的简单代码的一部分:

//Reading data
val mapped : DataSet[Vector] = data.map {x => DenseVector (x._1,x._2) }

//Create algorithm
val knn = KNN()
  .setK(3)
  .setBlocks(10)
  .setDistanceMetric(SquaredEuclideanDistanceMetric())
  .setUseQuadTree(false)
  .setSizeHint(CrossHint.SECOND_IS_SMALL)
...
//Just to learn I use the same data predicting the model
val result = knn.predict(mapped).collect()

When I print the data or to use predict method, i got this ERROR : 当我打印数据或使用预测方法时,出现此错误

org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Job execution failed.
  at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:409)
  at org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:95)
  at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:382)
  at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:369)
  at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:344)
  at org.apache.flink.client.RemoteExecutor.executePlanWithJars(RemoteExecutor.java:211)
  at org.apache.flink.client.RemoteExecutor.executePlan(RemoteExecutor.java:188)
  at org.apache.flink.api.java.RemoteEnvironment.execute(RemoteEnvironment.java:172)
  at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:896)
  at org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:637)
  at org.apache.flink.api.scala.DataSet.collect(DataSet.scala:547)
  ... 36 elided
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
  at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:822)
  at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:768)
  at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:768)
  at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
  at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
  at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
  at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
  at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
  at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
  at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
  at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
  at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.flink.api.common.io.ParseException: Line could not be parsed: '-6.59 -44.68'
ParserError NUMERIC_VALUE_FORMAT_ERROR
Expect field types: class java.lang.Double, class java.lang.Double
in file: /home/borja/flink/kmeans/points
  at org.apache.flink.api.common.io.GenericCsvInputFormat.parseRecord(GenericCsvInputFormat.java:407)
  at org.apache.flink.api.java.io.CsvInputFormat.readRecord(CsvInputFormat.java:110)
  at org.apache.flink.api.common.io.DelimitedInputFormat.nextRecord(DelimitedInputFormat.java:470)
  at org.apache.flink.api.java.io.CsvInputFormat.nextRecord(CsvInputFormat.java:78)
  at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:162)
  at org.apache.flink.runtime.taskmanager.Task.run(Task.java:585)
  at java.lang.Thread.run(Thread.java:748)

I do not know if it's my fault loading the data or it has related with something like that . 我不知道这是否是我的错载入数据或它的东西,如相关的那个

Thanks for any help! 谢谢你的帮助! :) :)

You haven't shown us the code you are using to read and parse the data, which is where the error is occurring. 您尚未向我们显示用于读取和解析数据的代码,这就是发生错误的地方。 But given the error message, I'll hazard a guess that you are using readCSVFile with data that is delimited by spaces or tabs, and didn't specify the fieldDelimiter (which defaults to comma). 但是给出错误消息后,我会冒昧地猜测您正在使用readCSVFile并使用以空格或制表符分隔的数据,并且未指定fieldDelimiter(默认为逗号)。 If that's the case, see the docs for how to configure the CSV parser. 如果是这种情况,请参阅文档以了解如何配置CSV解析器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM