![](/img/trans.png)
[英]Spark: RDD.saveAsTextFile when using a pair of (K,Collection[V])
[英]format a (K, (v, w)) pair in spark rdd
我有這樣的rdd:
val custFile = sc.textFile("custInfo.txt").map(line => line.split('|'))
val custPrd = custFile.map(a => (a(0), ((a(1)), (a(2), a(3), a(4), a(5), a(6), a(7), a(8)))))
val custGrp = custPrd.groupByKey
custGrp.saveAsTextFile("custinfo2")
產生這個:
(1104,CompactBuffer((S_SAVG,(1,1,1,1,1,1,1)), (CN_SAVG,(4,4,1,1,4,1,1))))
我如何使用這樣的東西:
custPrdGrp.map{case (k, vals) => {val valsString = vals.mkString(", "); s"{$k:, {$valsString}}" }}
格式化(k,(v,w))對...我嘗試了這個但是遇到了一個錯誤:
val custPrdRep = custPrdGrp.map({case (k, (v, w)) => {val valsString = v.mkString(", "); val valsPrvcy = w.mkString(", "); s"'${k}'| [$valsString]" }})
<console>:27: error: constructor cannot be instantiated to expected type;
found : (T1, T2)
required: Iterable[(String, (String, String, String, String, String, String, String))]
val custPrdRep = custPrdGrp.map({case (k, (v, w)) => {val valsString = v.mkString(", "); val valsPrvcy = w.mkString(", "); s"'${k}'| [$valsString]" }})
^
<console>:27: error: not found: value v
val custPrdRep = custPrdGrp.map({case (k, (v, w)) => {val valsString = v.mkString(", "); val valsPrvcy = w.mkString(", "); s"'${k}'| [$valsString]" }})
^
<console>:27: error: not found: value w
val custPrdRep = custPrdGrp.map({case (k, (v, w)) => {val valsString = v.mkString(", "); val valsPrvcy = w.mkString(", "); s"'${k}'| [$valsString]" }})
我希望數組看起來像這樣:
('1104'|{'S_SAVG': {a: '1', b: '1', c: '1', d: '1', e: '1', f: '1', g: '1'}, 'CN_SAVG': {a: '4', b: '4', c: '1', d: '1', e: '4', f: '1', g: '1'}})
好吧,這里有很多細節,但是類似的事情應該起作用:
val keys = List("a", "b", "c", "d", "e", "f", "g")
custGrp.map{case (k, vals) => {
val valsString = vals map {
case (val1, val2) => {
val pairs = keys
// Create someLetter: 'someNumber' pairs
.zip(val2.productIterator.map{case (x: String) => x}.toSeq)
.map{case (k, v) => s"$k: '$v'"}
// Join into a single string
.mkString(", ")
// Add "key"
s"'$val1': {$pairs}"
}
}
// Combine above
val valsComb = valsString.mkString(", ")
// Create final string
s"('$k'|{$valsComb})"
}}
您可以通過首先創建正確的數據結構來簡化事情。 例如,通過使用地圖而不是元組:
Map("S_SAVG" -> Map("a" -> "1", "b" -> "1", ...), ...)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.