简体   繁体   English

将 csv 文件读取为 dataframe 时模式错误

[英]wrong schema while reading csv file as a dataframe

trying to read a csv file into a dataframe simple code试图将 csv 文件读入 dataframe 简单代码

df = spark.read.csv("1.csv")

i got我有

    df.printSchema()
root
 |-- _c0: string (nullable = true)

also i try this我也试试这个

db = spark.read.csv("1.csv", header=True, inferSchema= "True")
db.printSchema()
root
 |--                   id                  |                      date                      |                              cases                               |                      country                      |                       deaths                       |   cities   |    per_cap     | 

Thanks in advance for your help在此先感谢您的帮助

apparently, your line seperator is a pipe |显然,您的行分隔符是 pipe | . .

try:尝试:

db = spark.read.csv("1.csv", sep='|', header=True, inferSchema= "True")

for col in db.columns:
    db = db.withColumnRenamed(col, col.strip())

you should create your own schema.您应该创建自己的架构。
using scala:使用 scala:

val schemaExpected = new StructType()
.add("id" , StringType , nullable = true)
.add("date" , DateType, nullable = true)
...
.add(deaths , IntegerType, nullable = true)

then you can read your DataFrame:然后你可以读取你的 DataFrame:

val db = spark.read.option("header","true").schema(schemaExpected).csv("1.csv")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 读取 CSV 文件时跳过行并选择错误的单元格 - Skipping lines and selecting wrong cells while reading CSV file 在python中读取csv文件时跳过第二行数据帧 - skip second row of dataframe while reading csv file in python 在读取 CSV 时,用 0 填充 NaN 不适用于 Dataframe - Fill NaN with 0 not apply on Dataframe while reading CSV 读取 csv 文件中的列但输出错误 - Reading columns in a csv file but getting wrong output 如何在读取巨大的 csv 文件并使用 python 中的块创建数据帧时删除/忽略无效格式的数据 - How to remove/ignore invalid formatted data while reading a huge csv file and creating a Dataframe using chunks in python 从 csv 文件读取时,如果 Dataframe 中的列如何保持数字限制? - How to keep the digit limit if a column in Dataframe while reading from csv file? 将 CSV 文件读入数据帧时出现 UnicodeDecodeError - UnicodeDecodeError when reading CSV File into Dataframe 在 GCP python 云 function、dataframe 在读取 Z628CB19675FF524FE3ZE7 文件时将 ' ' 放在最后 - In GCP python cloud function, dataframe is putting ' ' in the end while reading csv file dataframe 在读取 CSV 文件时导入列的第一个值作为列名 - dataframe importing column's first value as column name while reading a CSV file 将带有元素列表的csv文件读入pandas数据帧 - Reading a csv file with a list of elements into pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM