I have two files: airports.csv and flights.csv. Airports have columns: IATA_CODE AIRPORT CITY STATE COUNTRY LATITUDE LONGITUDE. Flights have columns: YEAR MONTH DAY DAY_OF_WEEK AIRLINE FLIGHT_NUMBER TAIL_NUMBER ORIGIN_AIRPORT DESTINATION_AIRPORT SCHEDULED_DEPARTURE DEPARTURE_TIME DEPARTURE_DELAY TAXI_OUT WHEELS_OFF SCHEDULED_TIME ELAPSED_TIME AIR_TIME DISTANCE WHEELS_ON TAXI_IN SCHEDULED_ARRIVAL ARRIVAL_TIME ARRIVAL_DELAY DIVERTED CANCELLED CANCELLATION_REASON AIR_SYSTEM_DELAY SECURITY_DELAY AIRLINE_DELAY LATE_AIRCRAFT_DELAY WEATHER_DELAY.
I read the files in:
val airports = sc.textFile("./archive/airports_b.csv")
val flights = sc.textFile("./archive/flights_b.csv")
Created RDDes, followed instructions in different websites:
val airportRDD: RDD[(VertexId, (String))] = airports.map { line =>
val row = line split ','
(row(1).toLong, (row(2))) //1 IATA code, 2 - Airport name
}
val flightsRDD: RDD[Edge[String]] = flights.map {line =>
val row = line split ','
Edge(row(7).toLong, row(8).toLong, row(17)) // 7 Original Airport, 8 Destination Airport, 17 Distance
}
val graph = Graph(airportRDD, flightsRDD)
My next step is to just take first three samples:
println("Airports: " + airportRDD.take(3))
println("Flights: "+ flightsRDD.take(3))
But I am getting following error:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 25.0 failed 1 times, most recent failure: Lost task 0.0 in stage 25.0 (TID 58) (host.docker.internal executor driver): java.lang.NumberFormatException: For input string: "ABE"
Could someone advise what's wrong in the code?
Indexing in scala is zero based - first column is row(0)
instead of row(1)
and so on. Besides it is easier to use spark.read.option("headers",true).csv(hdfs_path)
lo load csv file i/o parsing it manually. If headers are not present, then you don't need option("headers",true)
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.