[英]How to create a new dataframe with CSV file from a folder with subfolders in Pyspark in S3
Hi I'm very new to Pyspark and S3.您好,我是 Pyspark 和 S3 的新手。 I have problem at hand.
我手头有问题。 I have a folder, which consists of subfolders and files and also files from the subfolder(all CSVs) i need to create a new dataframe or a csv file where i get contents of the files and create as a single file.
我有一个文件夹,其中包含子文件夹和文件以及子文件夹中的文件(所有 CSV)我需要创建一个新的 dataframe 或 csv 文件,我在其中获取文件的内容并创建为单个文件。 Which later need to be read to a table in postgress
稍后需要将其读取到 postgress 中的表中
Can anyone please help me.谁能帮帮我吗。 I have code in python, but not sure how to go about with pyspark and S3
我在 python 中有代码,但不确定 go 与 pyspark 和 S3 的关系
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.