简体   繁体   中英

How to Load data from CSV into separate Hadoop HDFS directories based on fields

I have a CSV of data and I need to load it into HDFS directories based on a certain field (year). I am planning to use Java. I have looked at using BufferedReader however I am having trouble implementing it. Would this be the optimal thing to use for this task or is there a better way?

Use Spark to read the CSV into a dataframe.

use partitionBy("year") during writing to HDFS, and it'll create sub-folders under the path starting with year= for each unique value.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM