简体   繁体   中英

pyspark wrapper for IndexedRowMatrix multiply()

The PySpark wrapper for the IndexedRowMatrix class doesn't include all the methods - specifically, the multiply() method is missing even though it's included in the Java implementation that it wraps. I tried adding it manually to PySpark/MlLib/inalg/distributed.py, as follows:

def multiply(self, other):
    other_java_matrix = other._java_matrix_wrapper._java_model
    java_matrix = self._java_matrix_wrapper.call("multiply", other_java_matrix)
    return IndexedRowMatrix(java_matrix)

However, I get the following error when I try to use this method:

py4j.Py4JException: Method multiply([class org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:335) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:344) at py4j.Gateway.invoke(Gateway.java:252) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745)

This is Spark version 1.6.1, so it should include this method AFAIK. Am I missing something?

IndexedRowMatrix doesn't support multiplication by another IndexedRowMatrix . It only supports multiplication by a local Matrix ( mllib.linalg.Matrix ).

To multiply distributed matrices you'll have to create a wrapper around BlockMatrix which is at this moment (Spark 1.6) the only distributed structure in MLlib which supports multiplication by another distributed matrix.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM