[英]Load multiple files into dataframe
Is it possible to load multiple files as one dataframe? 是否可以将多个文件作为一个数据框加载? Normally, if I have one file to load, I will call for example:
通常,如果我要加载一个文件,我会调用例如:
file1 = "/a/b/c/folder/file1.csv"
dc = sqlContext.read.format('com.databricks.spark.csv').options(header='false', inferschema='true').load(file1)
But I want to load all files under the folder /a/b/c/folder/*.csv
. 但我想加载文件夹
/a/b/c/folder/*.csv
下的所有文件。
I think sqlContext.read.format('com.databricks.spark.csv').options(header='false', inferschema='true').load(folder)
works. 我认为
sqlContext.read.format('com.databricks.spark.csv').options(header='false', inferschema='true').load(folder)
有效。 Previously I got error is because I am ready compressed files, and they are oversized compared with the memory 以前我得到的错误是因为我准备好压缩文件,并且与内存相比它们是超大的
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.