[英]How can i transpose csv data using java spark
我正在使用 java spark,我想知道是否可以转换下面给出的示例数据
Incremental Cost Number | Approver Names
---------------------------------------------------------------------------------
S703401 |Ryan P Cassidy|Christopher J Mattingly|Frank E
LaSota|Ryan P Cassidy|Anthony L Locricchio|Jason Monte
变成这样。
Incremental Cost Number| Approver Names
-------------------------------------------
S703401 | Ryan P Cassidy
S703401 | Christopher J Mattingly
S703401 | Frank E LaSota
S703401 | Ryan P Cassidy
S703401 | Anthony L Locricchio
S703401 | Jason Monte
此外,我正在导入的文件是一个逗号分隔的 csv 文件,只是包含多个值的特定列由管道符号分隔。 同样,如果我有多个增量成本数值。
我认为你需要用“|”分割第二列然后使用explode()函数
df.select(col("id"), explode(split(col("a"), "\\|")).as("a")).show()
+-------+--------------------+
| id| a|
+-------+--------------------+
|S703401| Ryan P Cassidy|
|S703401|Christopher J Mat...|
|S703401| Frank E|
注意:这是RDD的做事方式。 在 Scala 和 Dataframe 中可能更容易。
如果您有多个列,则可以执行以下操作
import org.apache.spark.sql.functions._
val df = Seq(("S703401","Ryan P Cassidy|Christopher J Mattingly|Frank E
LaSota|Ryan P Cassidy|Anthony L Locricchio|Jason
Monte","xyz|mnp|abc")).toDF("Incremental Cost Number","Approver
Names","3rd Column")
df.withColumn("Approver Names", explode(split(col("Approver Names"), "\\|")))
.withColumn("3rd Column", explode(split(col("3rd Column"), "\\|")))
.show()
+-----------------------+--------------------+-----------+
|Incremental Cost Number| Approver Names| 3rd Column|
+-----------------------+--------------------+-----------+
| S703401|Ryan P Cassidy|Ch...|xyz|mnp|abc|
+-----------------------+--------------------+-----------+
+-----------------------+--------------------+----------+
|Incremental Cost Number| Approver Names|3rd Column|
+-----------------------+--------------------+----------+
| S703401| Ryan P Cassidy| xyz|
| S703401| Ryan P Cassidy| mnp|
| S703401| Ryan P Cassidy| abc|
| S703401|Christopher J Mat...| xyz|
| S703401|Christopher J Mat...| mnp|
| S703401|Christopher J Mat...| abc|
| S703401| Frank E LaSota| xyz|
| S703401| Frank E LaSota| mnp|
| S703401| Frank E LaSota| abc|
| S703401| Ryan P Cassidy| xyz|
| S703401| Ryan P Cassidy| mnp|
| S703401| Ryan P Cassidy| abc|
| S703401|Anthony L Locricchio| xyz|
| S703401|Anthony L Locricchio| mnp|
| S703401|Anthony L Locricchio| abc|
| S703401| Jason Monte| xyz|
| S703401| Jason Monte| mnp|
| S703401| Jason Monte| abc|
+-----------------------+--------------------+----------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.