简体   繁体   中英

How to map a RDD of type org.apache.spark.rdd.RDD[Array[String]]?

I am new to Spark and Scala. I have an RDD that is of type org.apache.spark.rdd.RDD[Array[String]] .

Here is a listing from myRdd.take(3) .

Array(Array(1, 2524474, CBSGPRS, 1, 2015-09-09 10:42:03, 0, 47880, 302001131103734, NAT, "", 502161081073570, "", BLANK, UNK, "", "", "", MV_PVC, BLANK, 1, "", 0, 475078439, 41131;0;0, "", 102651;0;0, 3|3), Array(2, 2524516, CBSGPRS, 1, 2015-09-09 23:42:14, 0, 1260, 302001131104272, NAT, "", 502161081074085, "", BLANK, UNK, "", "", "", MV_PVC, BLANK, 1, "", 0, 2044745984, 3652;0;0, "", 8636;0;0, 3|3), Array(3, 2524545, CBSGPRS, 1, 2015-09-09 14:56:55, 0, 32886, 302001131101629, NAT, "", 502161081071599, "", BLANK, UNK, "", "", "", MV_PVC, BLANK, 1, "", 0, 1956194307, 14164657;0;0, "", 18231194;0;0, 3|3))

I am trying to map it as follows ..

var gprsMap = frows.collect().map{ tuple =>
// bind variables to the tuple
var (recKey, origRecKey, recTypeId, durSpanId, timestamp, prevConvDur, convDur,
    msisdn, callType, aPtyCellId, aPtyImsi, aPtyMsrn, bPtyNbr, bPtyNbrTypeId,
    bPtyCellId, bPtyImsi, bPtyMsrn, inTrgId, outTrgId, callStatusId, suppSvcId, provChgAmt,
    genFld1, genFld2, genFld3, genFld4, genFld5) = tuple

var dtm = timestamp.split(" ");
var idx = timestamp indexOf ' '
var dt = timestamp slice(0, idx)
var tm = timestamp slice(idx + 1, timestamp.length)

// return the results tuple
((dtm(0), msisdn, callType, recTypeId, provChgAmt), (convDur))
}

I keep getting error:

error: object Tuple27 is not a member of package scala.

I am not sure what the error is. Can someone help?

The problem is that Scala only supports tuples with up to 22 fields. Additionally, your frows: RDD[Array[String]] contains Array[String] as elements. Thus, also your tuple variable in the map function is of type Array[String] . Therefore, it is not possible to unapply the variable tuple into a tuple.

But what you can do is accessing the elements of the array directly via indices.

val recKey = tuple(0)
val timestamp = tuple(4)
...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM