简体繁体 English

Apache Spark中RowMatrix和Matrix之间的区别？

[英]Difference between RowMatrix and Matrix in Apache Spark?

原文 2016-02-19 05:09:16 0 2 java/ apache-spark/ apache-spark-mllib

我想知道Apache Spark中可用的RowMatrix和Matrix类之间的基本区别。

2 个解决方案

A little bit more precise question here would be what is a difference between mllib.linalg.Matrix and mllib.linalg.distributed.DistributedMatrix . 这里有一个更精确的问题是mllib.linalg.Matrix和mllib.linalg.distributed.DistributedMatrix之间有什么区别。

Matrix is a trait which represents local matrices which reside in a memory of a single machine. Matrix是一个特征，代表驻留在单个计算机内存中的局部矩阵 。 For now there are two basic implementations: DenseMatrix and SparseMatrix . 现在有两个基本实现： DenseMatrix和SparseMatrix 。
DistributedMatrix is a trait which represents distributed matrices build on top of RDD . DistributedMatrix是一个特征，代表建立在RDD之上的分布式矩阵 。 RowMatrix is a subclass of a DistributedMatrix which stores data in a row-wise manner without meaningful row ordering. RowMatrix是DistributedMatrix的子类，它以行方式存储数据，而没有有意义的行顺序。 There are other implementations of DistributedMatrix (like IndexedRowMatrix , CoordinateMatrix and BlockMatrix ) each with its own storage strategy and specific set of methods. DistributedMatrix还有其他实现（例如IndexedRowMatrix ， CoordinateMatrix和BlockMatrix ），每个实现都有自己的存储策略和特定的方法集。 See for example Matrix Multiplication in Apache Spark 参见例如Apache Spark中的矩阵乘法

This is going to come down a little to the idioms of the language / framework / discipline you're using, but in computer science, an array is a one dimensional "list" of "things" that can be referenced by their position in the list. 这将归结为您所使用的语言/框架/学科的惯用法，但是在计算机科学中，数组是“事物”的一维“列表”，可以通过它们在目录中的位置来引用。清单。 One of the things that can be in the list is another array which let you make arrays of arrays (of arrays of arrays ...) giving you a data set arbitrarily large dimension. 列表中可以包含的内容之一是另一个数组，该数组使您可以创建数组的数组（数组的数组...），从而为您提供任意大尺寸的数据集。

A matrix comes from linear algebra and is a two dimensional representation of data (which can be represented by an array of arrays) that comes with a powerful set of mathematical operations that allows you to manipulate the data in interesting ways. 矩阵来自线性代数，是数据的二维表示（可以用数组数组表示），带有一组强大的数学运算，可让您以有趣的方式操作数据。 While arrays can vary in size, the width and height of a matrix is generally know based on the specific type of operations you're going to perform. 尽管数组的大小可以变化，但是通常根据要执行的特定操作类型知道矩阵的宽度和高度。

Matrixes are used extensively in 3d graphics and physics engines because they are a fast, convenient way of representing transformation and acceleration data for objects in three dimensions. 矩阵在3D图形和物理引擎中被广泛使用，因为矩阵是一种快速，便捷的方式来表示三维对象的变换和加速度数据。

Array : Collection of homogeneous elements. 数组：齐次元素的集合。

Matrix : A simple row and column thing. 矩阵：一个简单的行和列的东西。

Both are different things in different spaces. 两者在不同的空间中是不同的事物。 But in computer programming, a collection of single dimensions array can be termed as matrix. 但是在计算机编程中，一维数组的集合可以称为矩阵。 You can represent an 2d Array(ie, collection of single dimension arrays) in matrix form. 您可以用矩阵形式表示二维数组（即，单维数组的集合）。

Example 例

A[2][3] : This means A is a collection of 2 single dimension arrays each of size 3. A [2] [3]：这意味着A是2个单维数组的集合，每个数组的大小为3。

A[1,1] A[1,2] A[1,3] //This is a single dimensional array A [1,1] A [1,2] A [1,3] //这是一维数组

A[2,1] A[2,2] A[2,3] //This is another single dimensional array A [2,1] A [2,2] A [2,3] //这是另一个一维数组

//The collection is a multi-dimensional or 2d Array. //集合是一个多维或二维数组。