![](/img/trans.png)
[英]Spark SQL - DataFrame - How to read different format date format
[英]Spark Dataframe from a different data format
我有这个数据集。 为此,我需要在 Scala 中创建一个 sparkdataframe。 此数据是 csv 文件中的一列。 列名是数据头
数据头
"{""date_time"":""1999/05/22 03:03:07.011"",""cust_id"":""cust1"",""timestamp"":944248234000,""msgId"":""113"",""activityTimeWindowMilliseconds"":20000,""ec"":""event1"",""name"":""ABC"",""entityId"":""1001"",""et"":""StateChange"",""logType"":""type123,""lastActivityTS"":944248834000,""sc_id"":""abc1d1c9"",""activityDetectedInLastTimeWindow"":true}"
"{""date_time"":""1999/05/23 03:03:07.011"",""cust_id"":""cust1"",""timestamp"":944248234000,""msgId"":""114"",""activityTimeWindowMilliseconds"":20000,""ec"":""event2"",""name"":""ABC"",""entityId"":""1001"",""et"":""StateChange"",""logType"":""type123,""lastActivityTS"":944248834000,""sc_id"":""abc1d1c9"",""activityDetectedInLastTimeWindow"":true}"
我能够读取 csv 文件 -
val df_tmp = spark
.read
.format("com.databricks.spark.csv")
.option("header","true")
.option("quoteMode", "ALL")
.option("delimiter", ",")
.option("escape", "\"")
//.option("inferSchema","true")
.option("multiline", "true")
.load("D:\\dataFile.csv")
我试图将数据拆分为数据框中的单独列,但没有成功。
我在数据中注意到的一件事是键和值都用双双引号""key1"":""value1""
括起来
如果要获取数据字段内的字段,则需要对其进行解析并将其写入新的 CSV 文件。 明明是json格式的字符串
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.