简体   繁体   English

如何将一组向量求和并在Spark中产生一个新向量

[英]How do I sum a set of vectors and produce a new vector in Spark

I am using Spark's Java API, and read a lot of data with following schema: 我正在使用Spark的Java API,并使用以下模式读取大量数据:

profits (Array of Double values):
--------------------------------- 
[1.0,2.0,3.0] 
[2.0,3.0,4.0] 
[4,0,6.0]

Once I have a dataframe, I want to compute a new vector which is the sum of all the vectors: 有了数据框后,我想计算一个新的向量,它是所有向量的总和:

Result:
[7.0,11.0,7.0]

I see some examples online on doing this in Scala and Python, but nothing for Java. 我在网上看到了一些在Scala和Python中执行此操作的示例,但对于Java没有任何示例。

val withIndex = profits.zipWithIndex // ((a,0),(b,1),(c,2))

We need to use the index as key: 我们需要使用索引作为键:

val indexKey = withIndex.map{case (k,v) => (v,k)}  //((0,a),(1,b),(2,c))

Finallly, 最后,

counts = indexKey.reduceByKey(lambda k, v: k + v)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何检查向量中向量的位置是否超出范围? - How do I check if a position in my vector of vectors is out of bounds? 在Java中,如何根据文件的最后修改日期将向量中的文件分类为3个不同的向量? - In Java, how do I sort files in a vector into 3 different vectors according to the file's last modified date? 如何对矢量矢量进行排序? - How to sort a Vector of Vectors? 如何将sum的值存储在新数组b中 - How do i store the value of sum in the new array b 如何在YARN Spark作业中设置环境变量? - How do I set an environment variable in a YARN Spark job? 如何设置和获取向量 <Integer> 在两个不同的班级? - How do I set and get a Vector<Integer> in two different classes? 如何以编程方式设置Vector Drawable的组参数? - How do I set a group parameter of Vector Drawable programmatically? 如何生成一个数组,其中设置了前两个数字并且 rest 是随机的? - How do I produce an array where the first two numbers are set and the rest are random? 如何创建一个计算一组数字之和的程序(Java)? - How do I create a program that calculates the sum of a set of numbers (Java)? 在apache spark中,如何在groupBy()之后将一列mllib Vector收集到列表中? - In apache spark, how do I collect a column of mllib Vector into a list after groupBy()?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM