简体   繁体   English

Apache Spark中RowMatrix和Matrix之间的区别?

[英]Difference between RowMatrix and Matrix in Apache Spark?

我想知道Apache Spark中可用的RowMatrix和Matrix类之间的基本区别。

A little bit more precise question here would be what is a difference between mllib.linalg.Matrix and mllib.linalg.distributed.DistributedMatrix . 这里有一个更精确的问题是mllib.linalg.Matrixmllib.linalg.distributed.DistributedMatrix之间有什么区别。

  • Matrix is a trait which represents local matrices which reside in a memory of a single machine. Matrix是一个特征,代表驻留在单个计算机内存中的局部矩阵 For now there are two basic implementations: DenseMatrix and SparseMatrix . 现在有两个基本实现: DenseMatrixSparseMatrix
  • DistributedMatrix is a trait which represents distributed matrices build on top of RDD . DistributedMatrix是一个特征,代表建立在RDD之上的分布式矩阵 RowMatrix is a subclass of a DistributedMatrix which stores data in a row-wise manner without meaningful row ordering. RowMatrixDistributedMatrix的子类,它以行方式存储数据,而没有有意义的行顺序。 There are other implementations of DistributedMatrix (like IndexedRowMatrix , CoordinateMatrix and BlockMatrix ) each with its own storage strategy and specific set of methods. DistributedMatrix还有其他实现(例如IndexedRowMatrixCoordinateMatrixBlockMatrix ),每个实现都有自己的存储策略和特定的方法集。 See for example Matrix Multiplication in Apache Spark 参见例如Apache Spark中的矩阵乘法

This is going to come down a little to the idioms of the language / framework / discipline you're using, but in computer science, an array is a one dimensional "list" of "things" that can be referenced by their position in the list. 这将归结为您所使用的语言/框架/学科的惯用法,但是在计算机科学中,数组是“事物”的一维“列表”,可以通过它们在目录中的位置来引用。清单。 One of the things that can be in the list is another array which let you make arrays of arrays (of arrays of arrays ...) giving you a data set arbitrarily large dimension. 列表中可以包含的内容之一是另一个数组,该数组使您可以创建数组的数组(数组的数组...),从而为您提供任意大尺寸的数据集。

A matrix comes from linear algebra and is a two dimensional representation of data (which can be represented by an array of arrays) that comes with a powerful set of mathematical operations that allows you to manipulate the data in interesting ways. 矩阵来自线性代数,是数据的二维表示(可以用数组数组表示),带有一组强大的数学运算,可让您以有趣的方式操作数据。 While arrays can vary in size, the width and height of a matrix is generally know based on the specific type of operations you're going to perform. 尽管数组的大小可以变化,但是通常根据要执行的特定操作类型知道矩阵的宽度和高度。

Matrixes are used extensively in 3d graphics and physics engines because they are a fast, convenient way of representing transformation and acceleration data for objects in three dimensions. 矩阵在3D图形和物理引擎中被广泛使用,因为矩阵是一种快速,便捷的方式来表示三维对象的变换和加速度数据。

Array : Collection of homogeneous elements. 数组:齐次元素的集合。

Matrix : A simple row and column thing. 矩阵:一个简单的行和列的东西。

Both are different things in different spaces. 两者在不同的空间中是不同的事物。 But in computer programming, a collection of single dimensions array can be termed as matrix. 但是在计算机编程中,一维数组的集合可以称为矩阵。 You can represent an 2d Array(ie, collection of single dimension arrays) in matrix form. 您可以用矩阵形式表示二维数组(即,单维数组的集合)。

Example

A[2][3] : This means A is a collection of 2 single dimension arrays each of size 3. A [2] [3]:这意味着A是2个单维数组的集合,每个数组的大小为3。

A[1,1] A[1,2] A[1,3] //This is a single dimensional array A [1,1] A [1,2] A [1,3] //这是一维数组

A[2,1] A[2,2] A[2,3] //This is another single dimensional array A [2,1] A [2,2] A [2,3] //这是另一个一维数组

//The collection is a multi-dimensional or 2d Array. //集合是一个多维或二维数组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM