帶SQLContext的Apache SPARK :: IndexError

Question

我正在嘗試執行Apache SPARK文檔的使用反射來推斷模式部分中提供的基本示例。

我正在Cloudera Quickstart VM（CDH5）上執行此操作

我試圖執行的示例如下：

# sc is an existing SparkContext.
from pyspark.sql import SQLContext, Row
sqlContext = SQLContext(sc)

# Load a text file and convert each line to a Row.
lines = sc.textFile("/user/cloudera/analytics/book6_sample.csv")
parts = lines.map(lambda l: l.split(","))
people = parts.map(lambda p: Row(name=p[0], age=int(p[1])))

# Infer the schema, and register the DataFrame as a table.
schemaPeople = sqlContext.createDataFrame(people)
schemaPeople.registerTempTable("people")

# SQL can be run over DataFrames that have been registered as a table.
teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19")

# The results of SQL queries are RDDs and support all the normal RDD operations.
teenNames = teenagers.map(lambda p: "Name: " + p.name)
for teenName in teenNames.collect():
  print(teenName)

我完全按照上面的所示運行代碼，但是當我執行最后一個命令（for循環）時，總是收到錯誤“ IndexError：列表索引超出范圍 ”。

輸入文件book6_sample可在book6_sample.csv中獲得。

我完全按照上面的所示運行代碼，但是當我執行最后一個命令（for循環）時，總是收到錯誤“ IndexError：列表索引超出范圍”。

請建議我要去哪里的問題。

提前致謝。

問候，斯里

Answer 1

您的文件結尾處有一個空行，這會導致此錯誤。請在文本編輯器中打開文件，然后刪除該行，希望它能正常工作

帶SQLContext的Apache SPARK :: IndexError

問題描述

1 個解決方案

解決方案1
0 已采納 2016-06-28 06:40:54

帶SQLContext的Apache SPARK :: IndexError

問題描述

1 個解決方案

解決方案1 0 已采納 2016-06-28 06:40:54

解決方案1
0 已采納 2016-06-28 06:40:54