简体   繁体   English

将多个文件加载到数据帧中

[英]Load multiple files into dataframe

Is it possible to load multiple files as one dataframe? 是否可以将多个文件作为一个数据框加载? Normally, if I have one file to load, I will call for example: 通常,如果我要加载一个文件,我会调用例如:

file1 = "/a/b/c/folder/file1.csv"
dc = sqlContext.read.format('com.databricks.spark.csv').options(header='false', inferschema='true').load(file1)

But I want to load all files under the folder /a/b/c/folder/*.csv . 但我想加载文件夹/a/b/c/folder/*.csv下的所有文件。

I think sqlContext.read.format('com.databricks.spark.csv').options(header='false', inferschema='true').load(folder) works. 我认为sqlContext.read.format('com.databricks.spark.csv').options(header='false', inferschema='true').load(folder)有效。 Previously I got error is because I am ready compressed files, and they are oversized compared with the memory 以前我得到的错误是因为我准备好压缩文件,并且与内存相比它们是超大的

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM