![](/img/trans.png)
[英]Spark & Scala: Read in CSV file as DataFrame / Dataset
[英]How to read a csv file which contains empty spaces between columns in dataframe scala?
來自 csv 的第一行:
058921107 039128053 20200701-290640-0 20200701 000000BORGWARNER ITHACA LLC DBA BORGWARNE 489140-10001 LDD INVENTORY 039128053 1 4359697 PACKAGE,CHAIN DRIVE 005 285000492 0 19691231 185959 0 20200101 00000020200630 000000IMMEDIATE 1600 20200630 000000
使用的示例腳本:
import org.apache.spark.sql.{SQLContext, SparkSession}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.{DataFrame, SparkSession}
import org.apache.spark.sql.functions._
var df1: DataFrame = null
df1=spark.read.option("header","true").option("inferSchema","true").option("delimiter"," ").option("ignoreLeadingWhiteSpace","true")
.option("ignoreTrailingWhiteSpace","true").csv("test.csv")
df1.show(2)
無論正確與否,我都將列大小指定為18
。
df = spark.read.text('test.csv')
col_size = 18
df.withColumn('value', split(regexp_replace(regexp_replace('value', '([ ]*)$', ''), '([ ]{2,})', '\|'), '\|')) \
.select(*[col('value')[i] for i in range(0, col_size)]) \
.toDF(*[f'col{i + 1}' for i in range(0, col_size)]).show(30, False)
+---------+---------+-----------------+--------------------------------------------------+------------+-------------+---------+----+-------+-------------------+-----+---------+-----+---------------+-----+---------------------------------------+-----+---------------+
|col1 |col2 |col3 |col4 |col5 |col6 |col7 |col8|col9 |col10 |col11|col12 |col13|col14 |col15|col16 |col17|col18 |
+---------+---------+-----------------+--------------------------------------------------+------------+-------------+---------+----+-------+-------------------+-----+---------+-----+---------------+-----+---------------------------------------+-----+---------------+
|058921107|039128053|20200701-290640-0|20200701 000000BORGWARNER ITHACA LLC DBA BORGWARNE|489140-10001|LDD INVENTORY|039128053|1 |4359697|PACKAGE,CHAIN DRIVE|005 |285000492|0 |19691231 185959|0 |20200101 00000020200630 000000IMMEDIATE|1600 |20200630 000000|
+---------+---------+-----------------+--------------------------------------------------+------------+-------------+---------+----+-------+-------------------+-----+---------+-----+---------------+-----+---------------------------------------+-----+---------------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.