[英]Reading a csv file as a spark dataframe
我有一个CSV文件和一个标题,必须通过Spark(2.0.0和Scala 2.11.8)作为数据帧读取。
示例csv数据:
Item,No. of items,Place
abc,5,xxx
def,6,yyy
ghi,7,zzz
.........
当我尝试将spark中的这个csv数据作为数据帧读取时,我遇到了问题,因为标题包含具有特殊字符“。”的列(项目编号)。
我尝试读取csv数据的代码是:
val spark = SparkSession.builder().appName("SparkExample")
import spark.implicits._
val df = spark.read.option("header", "true").csv("file:///INPUT_FILENAME")
我遇到的错误:
Exception in thread "main" org.apache.spark.sql.AnalysisException: Unable to resolve No. of items given [Item,No. of items,Place];
如果我删除"."
从标题,我不会得到任何错误。 甚至试图逃脱角色,但它逃脱了所有的"."
甚至来自数据的字符。
有没有办法逃脱特殊字符"."
仅使用spark代码从CSV标头?
@Pooja Nayak,不确定这是否已经解决; 为了社区的利益回答这个问题。
sc: SparkContext
spark: SparkSession
sqlContext: SQLContext
// Read the raw file from localFS as-is.
val rdd_raw = sc.textFile("file:///home/xxxx/sample.csv")
// Drop the first line in first partition because it is the header.
val rdd = rdd_raw.mapPartitionsWithIndex{(idx,iter) =>
if(idx == 0) iter.drop(1) else iter
}
// A function to create schema dynamically.
def schemaCreator(header: String): StructType = {
StructType(header
.split(",")
.map(field => StructField(field.trim, StringType, true))
)
}
// Create the schema for the csv that was read and store it.
val csvSchema: StructType = schemaCreator(rdd_raw.first)
// As the input is CSV, split it at "," and trim away the whitespaces.
val rdd_curated = rdd.map(x => x.split(",").map(y => y.trim)).map(xy => Row(xy:_*))
// Create the DF from the RDD.
val df = sqlContext.createDataFrame(rdd_curated, csvSchema)
import
必要的
import org.apache.spark.sql.types._
import org.apache.spark.sql._
import org.apache.spark._
我正在给你一个与pyspark合作的例子,希望同样适用于你,只需添加一些与语言相关的语法。
file =r'C:\Users\e5543130\Desktop\sampleCSV2.csv'
conf = SparkConf().setAppName('FICBOutputGenerator')
sc = SparkContext(conf=conf)
sc.setLogLevel("ERROR")
sqlContext = SQLContext(sc)
df = sqlContext.read.options(delimiter=",", header="true").csv("cars.csv") #Without deprecated API
df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").option("delimiter", ",").load("cars.csv")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.