[英]What is the meaning for reduceByKey(_ ++ _)
Recently I had scenario to store the the data in keyValue Pair and came across a function reduceByKey(_ ++ _)
. 最近我有把keyValue对中存储数据的方案,并遇到了函数
reduceByKey(_ ++ _)
。 This is more of shorthand syntax. 这更像是速记语法。 I am not able to understand what this actually means.
我无法理解这实际意味着什么。
Ex: reduceBykey(_ + _)
means reduceByKey((a,b)=>(a+b))
例:
reduceBykey(_ + _)
表示reduceByKey((a,b)=>(a+b))
So reduceByKey(_ ++ _)
means ?? 所以
reduceByKey(_ ++ _)
意味着??
I am able to create Key value pair out of data using reduceByKey(_ ++ _)
. 我可以使用
reduceByKey(_ ++ _)
从数据中创建Key值对。
val y = sc.textFile("file:///root/My_Spark_learning/reduced.txt")
y.map(value=>value.split(","))
.map(value=>(value(0),value(1),value(2)))
.collect
.foreach(println)
(1,2,3)
(1,3,4)
(4,5,6)
(7,8,9)
y.map(value=>value.split(","))
.map(value=>(value(0),Seq(value(1),value(2))))
.reduceByKey(_ ++ _)
.collect
.foreach(println)
(1,List(2, 3, 3, 4))
(4,List(5, 6))
(7,List(8, 9))
reduceByKey(_ ++ _)
translates to reduceByKey((a,b) => a ++ b)
. reduceByKey(_ ++ _)
转换为reduceByKey((a,b) => a ++ b)
。
++
is a method defined on List
that concatenates another list to it. ++
是List
上定义的一种方法 ,它将另一个列表连接到它。
So, for key 1 in the sample data, a
will be List(2,3)
and b
will be List(3,4)
and hence the concatenation of List(2,3)
and List(3,4)
( List(2,3) ++ List(3,4)
) would yield List(2,3,3,4)
. 因此,对于样本数据中的密钥1,
a
将是List(2,3)
, b
将是List(3,4)
,因此List(2,3)
和List(3,4)
的串联( List(2,3) ++ List(3,4)
)将产生List(2,3,3,4)
。
reduceByKey(_ ++ _)
is equivalent to reduceByKey((x,y)=> x ++ y)
reduceByKey
takes two parameters, apply a function and returns reduceByKey(_ ++ _)
等效于reduceByKey((x,y)=> x ++ y)
reduceByKey
接受两个参数,应用一个函数并返回
At the first it crates a set and ++
just adds collections together, combining elements of both sets. 首先它创建一个集合,
++
只是将集合添加到一起,组合两个集合的元素。
For each key It keeps appending in the list. 对于每个键,它会一直附加到列表中。 In your case of 1 as a key x will be
List(2,3)
and y will List (3,4)
and ++
will add both as List (2,3,3,4)
在你的情况下,1作为键,x将是
List(2,3)
,y将List (3,4)
, ++
将添加为List (2,3,3,4)
If you had another value like (1,4,5)
then the x would be List(4,5)
in this case and y should be List (2,3,3,4)
and result would be List(2,3,3,4,4,5)
如果你有另一个值,如
(1,4,5)
那么在这种情况下x将是List(4,5)
,y应该是List (2,3,3,4)
,结果将是List(2,3,3,4,4,5)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.