[英]Unable to create dataframe from RDD
我正在嘗試從這個 kaggle 數據集創建一個推薦系統:f7a1f242-c
https://www.kaggle.com/kerneler/starter-user-artist-playcount-dataset-f7a1f242-c
該文件名為:“user_artist_data_small.txt”
數據如下所示:
1059637 1000010 238
1059637 1000049 1
1059637 1000056 1
1059637 1000062 11
1059637 1000094 1
我在代碼的倒數第三行遇到錯誤。
!pip install pyspark==3.0.1 py4j==0.10.9
from pyspark.sql import SparkSession
from pyspark import SparkContext
appName="Collaborative Filtering with PySpark"
from pyspark.sql.types import StructType,StructField,IntegerType,StringType,LongType
from pyspark.sql.functions import col
from pyspark.ml.recommendation import ALS
from google.colab import drive
drive.mount ('/content/gdrive')
spark = SparkSession.builder.appName(appName).getOrCreate()
sc = spark.sparkContext
userArtistData1=sc.textFile("/content/gdrive/My Drive/data/user_artist_data_small.txt")
schema_user_artist = StructType([StructField("userId",StringType(),True),StructField("artistId",StringType(),True),StructField("playCount",StringType(),True)])
userArtistRDD = userArtistData1.map(lambda k: k.split())
user_artist_df = spark.createDataFrame(userArtistRDD,schema_user_artist,['userId','artistId','playCount'])
ua = user_artist_df.alias('ua')
(training, test) = ua.randomSplit([0.8, 0.2]) #Training the model
als = ALS(maxIter=5, implicitPrefs=True,userCol="userId", itemCol="artistId", ratingCol="playCount",coldStartStrategy="drop")
model = als.fit(training)# predict using the testing datatset
predictions = model.transform(test)
predictions.show()
錯誤是:
IllegalArgumentException: requirement failed: Column userId must be of type numeric but was actually of type string.
所以我在模式中將類型從 StringType 更改為 IntegerType ,我得到了這個錯誤:
TypeError: field userId: IntegerType can not accept object '1059637' in type <class 'str'>
該數字恰好是數據集中的第一項。 請幫忙?
只需使用 CSV 閱讀器(帶有空格分隔符)創建 dataframe,而不是創建 RDD:
user_artist_df = spark.read.schema(schema_user_artist).csv('/content/gdrive/My Drive/data/user_artist_data_small.txt', sep=' ')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.