[英]wrong schema while reading csv file as a dataframe
trying to read a csv file into a dataframe simple code试图将 csv 文件读入 dataframe 简单代码
df = spark.read.csv("1.csv")
i got我有
df.printSchema()
root
|-- _c0: string (nullable = true)
also i try this我也试试这个
db = spark.read.csv("1.csv", header=True, inferSchema= "True")
db.printSchema()
root
|-- id | date | cases | country | deaths | cities | per_cap |
Thanks in advance for your help在此先感谢您的帮助
apparently, your line seperator is a pipe |
显然,您的行分隔符是 pipe |
. .
try:尝试:
db = spark.read.csv("1.csv", sep='|', header=True, inferSchema= "True")
for col in db.columns:
db = db.withColumnRenamed(col, col.strip())
you should create your own schema.您应该创建自己的架构。
using scala:使用 scala:
val schemaExpected = new StructType()
.add("id" , StringType , nullable = true)
.add("date" , DateType, nullable = true)
...
.add(deaths , IntegerType, nullable = true)
then you can read your DataFrame:然后你可以读取你的 DataFrame:
val db = spark.read.option("header","true").schema(schemaExpected).csv("1.csv")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.