简体   繁体   中英

java.lang.IllegalArgumentException: Illegal sequence boundaries Spark

I am using Azure Databricks and Scala. I wanna show() a Dataframe but I obtained an error that I can not understand and I would like to solve it. The lines of code that I have are:

println("----------------------------------------------------------------Printing schema")
df.printSchema()
println("----------------------------------------------------------------Printing dataframe")
df.show()
println("----------------------------------------------------------------Error before")

The Standard output is the following one, the message "----------------------------------------------------------------Error before" it does not appears.

>     ----------------------------------------------------------------Printing schema
>     root
>      |-- processed: integer (nullable = false)
>      |-- processDatetime: string (nullable = false)
>      |-- executionDatetime: string (nullable = false)
>      |-- executionSource: string (nullable = false)
>      |-- executionAppName: string (nullable = false)
>     
>     ----------------------------------------------------------------Printing dataframe
>     2020-02-18T14:19:00.069+0000: [GC (Allocation Failure) [PSYoungGen: 1497248K->191833K(1789440K)] 2023293K->717886K(6063104K),
> 0.0823288 secs] [Times: user=0.18 sys=0.02, real=0.09 secs] 
>     2020-02-18T14:19:40.823+0000: [GC (Allocation Failure) [PSYoungGen: 1637209K->195574K(1640960K)] 2163262K->721635K(5914624K),
> 0.0483384 secs] [Times: user=0.17 sys=0.00, real=0.05 secs] 
>     2020-02-18T14:19:44.843+0000: [GC (Allocation Failure) [PSYoungGen: 1640950K->139092K(1809920K)] 2167011K->665161K(6083584K),
> 0.0301711 secs] [Times: user=0.11 sys=0.00, real=0.03 secs] 
>     2020-02-18T14:19:50.910+0000: Track exception: Job aborted due to stage failure: Task 59 in stage 62.0 failed 4 times, most recent
> failure: Lost task 59.3 in stage 62.0 (TID 2672, 10.139.64.6, executor
> 1): java.lang.IllegalArgumentException: Illegal sequence boundaries:
> 1581897600000000 to 1581811200000000 by 86400000000
>       at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage23.processNext(Unknown
> Source)
>       at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>       at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$15$$anon$2.hasNext(WholeStageCodegenExec.scala:659)
>       at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>       at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
>       at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>       at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
>       at org.apache.spark.scheduler.Task.doRunTask(Task.scala:139)
>       at org.apache.spark.scheduler.Task.run(Task.scala:112)
>       at org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:497)
>       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1526)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:503)
>       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
>     
>     Driver stacktrace:.
>     2020-02-18T14:19:50.925+0000: Track message: Process finished with exit code 1. Metric: Writer. Value: 1.0.

It's hard to know exactly without seeing your code, but I had a similar error and the other answer (about int being out of range) led me astray.

The java.lang.IllegalArgumentException you are getting is confusing but is actually quite specific:

Illegal sequence boundaries: 1581897600000000 to 1581811200000000 by 86400000000

This error is complaining that that you are using a sequence() spark SQL function and you are telling it to go from 1581897600000000 to 1581811200000000 by 86400000000. It's easy to miss because of the big numbers, but this an instruction to go from a larger number to a smaller number by an increment of a positive integer. Eg, from 12 to 6 by 3.

This is not allowed according to the DataBricks documentation :

  • start - an expression. The start of the range.
  • stop - an expression. The end the range (inclusive).
  • step - an optional expression. The step of the range. By default step is 1 if start is less than or equal to stop, otherwise -1. For the temporal sequences it's 1 day and -1 day respectively. If start is greater than stop then the step must be negative, and vice versa.

Additionally, I believe the other answer's focus on the int column is misleading. The large numbers mentioned in the illegal sequence error look like they are coming from a date column. You don't have any DateType columns but your string columns are named like date columns; presumably you are using them in a sequence function and they are getting coerced into dates.

Your schema is expecting an int, an int in Java has a maximum size of [-2 147 483 648 to +2 147 483 647] .

So I would change the schema from int to long.

You can get this error when you attempt to

sequence(start_date, end_date, [interval]) 

on a table which has some of start_dates less than end_dates and others greater

When applying this function all of date ranges should be either positive or negative, not mixed

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM