[英]Difference between RowMatrix and Matrix in Apache Spark?
我想知道Apache Spark中可用的RowMatrix和Matrix类之间的基本区别。
A little bit more precise question here would be what is a difference between mllib.linalg.Matrix
and mllib.linalg.distributed.DistributedMatrix
. 这里有一个更精确的问题是
mllib.linalg.Matrix
和mllib.linalg.distributed.DistributedMatrix
之间有什么区别。
Matrix
is a trait which represents local matrices which reside in a memory of a single machine. Matrix
是一个特征,代表驻留在单个计算机内存中的局部矩阵 。 For now there are two basic implementations: DenseMatrix
and SparseMatrix
. DenseMatrix
和SparseMatrix
。 DistributedMatrix
is a trait which represents distributed matrices build on top of RDD
. DistributedMatrix
是一个特征,代表建立在RDD
之上的分布式矩阵 。 RowMatrix
is a subclass of a DistributedMatrix
which stores data in a row-wise manner without meaningful row ordering. RowMatrix
是DistributedMatrix
的子类,它以行方式存储数据,而没有有意义的行顺序。 There are other implementations of DistributedMatrix
(like IndexedRowMatrix
, CoordinateMatrix
and BlockMatrix
) each with its own storage strategy and specific set of methods. DistributedMatrix
还有其他实现(例如IndexedRowMatrix
, CoordinateMatrix
和BlockMatrix
),每个实现都有自己的存储策略和特定的方法集。 See for example Matrix Multiplication in Apache Spark This is going to come down a little to the idioms of the language / framework / discipline you're using, but in computer science, an array is a one dimensional "list" of "things" that can be referenced by their position in the list. 这将归结为您所使用的语言/框架/学科的惯用法,但是在计算机科学中,数组是“事物”的一维“列表”,可以通过它们在目录中的位置来引用。清单。 One of the things that can be in the list is another array which let you make arrays of arrays (of arrays of arrays ...) giving you a data set arbitrarily large dimension.
列表中可以包含的内容之一是另一个数组,该数组使您可以创建数组的数组(数组的数组...),从而为您提供任意大尺寸的数据集。
A matrix comes from linear algebra and is a two dimensional representation of data (which can be represented by an array of arrays) that comes with a powerful set of mathematical operations that allows you to manipulate the data in interesting ways. 矩阵来自线性代数,是数据的二维表示(可以用数组数组表示),带有一组强大的数学运算,可让您以有趣的方式操作数据。 While arrays can vary in size, the width and height of a matrix is generally know based on the specific type of operations you're going to perform.
尽管数组的大小可以变化,但是通常根据要执行的特定操作类型知道矩阵的宽度和高度。
Matrixes are used extensively in 3d graphics and physics engines because they are a fast, convenient way of representing transformation and acceleration data for objects in three dimensions. 矩阵在3D图形和物理引擎中被广泛使用,因为矩阵是一种快速,便捷的方式来表示三维对象的变换和加速度数据。
Array : Collection of homogeneous elements. 数组:齐次元素的集合。
Matrix : A simple row and column thing. 矩阵:一个简单的行和列的东西。
Both are different things in different spaces. 两者在不同的空间中是不同的事物。 But in computer programming, a collection of single dimensions array can be termed as matrix.
但是在计算机编程中,一维数组的集合可以称为矩阵。 You can represent an 2d Array(ie, collection of single dimension arrays) in matrix form.
您可以用矩阵形式表示二维数组(即,单维数组的集合)。
Example 例
A[2][3] : This means A is a collection of 2 single dimension arrays each of size 3.
A [2] [3]:这意味着A是2个单维数组的集合,每个数组的大小为3。
A[1,1] A[1,2] A[1,3] //This is a single dimensional array
A [1,1] A [1,2] A [1,3] //这是一维数组
A[2,1] A[2,2] A[2,3] //This is another single dimensional array
A [2,1] A [2,2] A [2,3] //这是另一个一维数组
//The collection is a multi-dimensional or 2d Array.
//集合是一个多维或二维数组。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.