簡體   English   中英

如何讀取包含數據幀 scala 列之間空格的 csv 文件?

[英]How to read a csv file which contains empty spaces between columns in dataframe scala?

在記事本中打開的 CSV 文件 試圖加載包含列之間空格的 csv 文件。

來自 csv 的第一行:

058921107                          039128053                          20200701-290640-0             20200701 000000BORGWARNER ITHACA LLC DBA BORGWARNE                         489140-10001                       LDD INVENTORY                                               039128053           1     4359697                                           PACKAGE,CHAIN DRIVE                                                                                 005                 285000492           0                     19691231 185959                              0                     20200101 00000020200630 000000IMMEDIATE                1600                  20200630 000000   

使用的示例腳本:

import org.apache.spark.sql.{SQLContext, SparkSession}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.{DataFrame, SparkSession}
import org.apache.spark.sql.functions._

var df1: DataFrame = null
df1=spark.read.option("header","true").option("inferSchema","true").option("delimiter"," ").option("ignoreLeadingWhiteSpace","true")
.option("ignoreTrailingWhiteSpace","true").csv("test.csv")

df1.show(2)

無論正確與否,我都將列大小指定為18

df = spark.read.text('test.csv')

col_size = 18

df.withColumn('value', split(regexp_replace(regexp_replace('value', '([ ]*)$', ''), '([ ]{2,})', '\|'), '\|')) \
  .select(*[col('value')[i] for i in range(0, col_size)]) \
  .toDF(*[f'col{i + 1}' for i in range(0, col_size)]).show(30, False)

+---------+---------+-----------------+--------------------------------------------------+------------+-------------+---------+----+-------+-------------------+-----+---------+-----+---------------+-----+---------------------------------------+-----+---------------+
|col1     |col2     |col3             |col4                                              |col5        |col6         |col7     |col8|col9   |col10              |col11|col12    |col13|col14          |col15|col16                                  |col17|col18          |
+---------+---------+-----------------+--------------------------------------------------+------------+-------------+---------+----+-------+-------------------+-----+---------+-----+---------------+-----+---------------------------------------+-----+---------------+
|058921107|039128053|20200701-290640-0|20200701 000000BORGWARNER ITHACA LLC DBA BORGWARNE|489140-10001|LDD INVENTORY|039128053|1   |4359697|PACKAGE,CHAIN DRIVE|005  |285000492|0    |19691231 185959|0    |20200101 00000020200630 000000IMMEDIATE|1600 |20200630 000000|
+---------+---------+-----------------+--------------------------------------------------+------------+-------------+---------+----+-------+-------------------+-----+---------+-----+---------------+-----+---------------------------------------+-----+---------------+

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM