简体   繁体   中英

AttributeError: 'DataFrame' object has no attribute 'get' on VectorAssembler spark ML

I'm trying to follow the example discussed here and I just copied the code into Zeppelin paragraph.

%pyspark
import pandas as pd
from pyspark.sql import SQLContext
from pyspark.ml.feature import VectorAssembler
from pyspark.mllib.linalg import Vectors

dataset = sqlContext.createDataFrame(
[(0, 18, 1.0, Vectors.dense([0.0, 10.0, 0.5]), 1.0)],
["id", "hour", "mobile", "userFeatures", "clicked"])
print(type(dataset))
assembler = VectorAssembler(
inputCols=["hour", "mobile", "userFeatures"],
outputCol="features")
output = assembler.transform(dataset)

However, I got this error:

Traceback (most recent call last): 
  File "/tmp/zeppelin_pyspark.py", line 164, in <module> 
    intp.setStatementsFinished(output.get(), False) 
  File "/home/zeppelin/zeppelin-0.5.5-incubating-bin-all/interpreter/spark/pyspark/pyspark.zip/pyspark/sql/dataframe.py", line 749, in __getattr__ "'%s' object has no attribute '%s'" % (self.__class__.__name__, name)) 
AttributeError: 'DataFrame' object has no attribute 'get'

Any advice?

you can try changing

from pyspark.mllib.linalg import Vectors

with

from pyspark.ml.linalg import Vectors

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM