简体   繁体   English

Scala:如何在数据框中合并多个CSV文件

[英]Scala: How to merge the multiple CSV files in data frame

I am writing the below code to get the csv file in RDD, I want to union multiple csv files and want to store in the single RDD variable. 我正在编写以下代码以在RDD中获取csv文件,我要合并多个csv文件并希望存储在单个RDD变量中。 I am able to store the data of one csv file in RDD kindly help me how to union multiple csv files and to store in single RDD variable . 我能够将一个csv文件的数据存储在RDD中,请帮助我如何合并多个csv文件并将其存储在单个RDD变量中。

val Rdd = spark.sparkContext.textFile(“File1.csv").map(_.split(","))

I am expecting something like 我期待类似的东西

 val Rdd = spark.sparkContext.textFile(“File1.csv").map(_.split(",")) union spark.sparkContext.textFile(“File2.csv").map(_.split(",")) 

If you have a large number of files I would suggest 如果您有大量文件,我建议

val rdd = List("file1", "file2", "file3", "file4", "file5")
  .map(spark.sparkContext.textFile(_))
  .reduce(_ union _)

Or if you only know you have 0 or more files: 或者,如果您只知道有0个或更多文件,则:

val rdd = getListOfFilenames()
  .map(spark.sparkContext.textFile(_))
  .foldLeft(spark.sparkContext.emptyRDD[String])(_ union _)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在SPARK数据框创建的文件夹中合并所有零件文件并在Scala中将其重命名为文件夹名称 - How to merge all part files in a folder created by SPARK data frame and rename as folder name in scala 如何在Scala中合并两个文本文件并将其转换为csv文件 - How to merge two text files and convert it to csv file in Scala 将两个 CSV 文件的交集与 Scala 合并 - Merge the intersection of two CSV files with Scala 多scala数据帧连接 - Multiple scala data frame concat 如何使用Scala在数据框中连接多个列 - How to concat multiple columns in a data frame using Scala 如何在 scala 中合并多个导入? - How to merge multiple imports in scala? 当第一行是架构时,如何在Spark中使用csv创建数据框(使用scala)? - How to create Data frame from csv in Spark(using scala) when the first line is the schema? Scala - 如何合并 HDFS 位置的增量文件 - Scala - How to merge incremental files of HDFS location 使用 Scala 将数据帧转换为字符串并将输出保存到 csv - Convert data frame into String using scala and save the ouput to a csv 是否可以以相同或不同的顺序将具有相同标题或标题子集的多个 csv 文件读取到 spark 数据框中? - Is it possible to read multiple csv files with same header or subset of header in same or different order into spark data frame?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM