简体   繁体   中英

Difference between RowMatrix and Matrix in Apache Spark?

我想知道Apache Spark中可用的RowMatrix和Matrix类之间的基本区别。

A little bit more precise question here would be what is a difference between mllib.linalg.Matrix and mllib.linalg.distributed.DistributedMatrix .

  • Matrix is a trait which represents local matrices which reside in a memory of a single machine. For now there are two basic implementations: DenseMatrix and SparseMatrix .
  • DistributedMatrix is a trait which represents distributed matrices build on top of RDD . RowMatrix is a subclass of a DistributedMatrix which stores data in a row-wise manner without meaningful row ordering. There are other implementations of DistributedMatrix (like IndexedRowMatrix , CoordinateMatrix and BlockMatrix ) each with its own storage strategy and specific set of methods. See for example Matrix Multiplication in Apache Spark

This is going to come down a little to the idioms of the language / framework / discipline you're using, but in computer science, an array is a one dimensional "list" of "things" that can be referenced by their position in the list. One of the things that can be in the list is another array which let you make arrays of arrays (of arrays of arrays ...) giving you a data set arbitrarily large dimension.

A matrix comes from linear algebra and is a two dimensional representation of data (which can be represented by an array of arrays) that comes with a powerful set of mathematical operations that allows you to manipulate the data in interesting ways. While arrays can vary in size, the width and height of a matrix is generally know based on the specific type of operations you're going to perform.

Matrixes are used extensively in 3d graphics and physics engines because they are a fast, convenient way of representing transformation and acceleration data for objects in three dimensions.

Array : Collection of homogeneous elements.

Matrix : A simple row and column thing.

Both are different things in different spaces. But in computer programming, a collection of single dimensions array can be termed as matrix. You can represent an 2d Array(ie, collection of single dimension arrays) in matrix form.

Example

A[2][3] : This means A is a collection of 2 single dimension arrays each of size 3.

A[1,1] A[1,2] A[1,3] //This is a single dimensional array

A[2,1] A[2,2] A[2,3] //This is another single dimensional array

//The collection is a multi-dimensional or 2d Array.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM