简体繁体中英

How to Load data from CSV into separate Hadoop HDFS directories based on fields

原文 2021-11-03 20:31:00 8 1 java/ hadoop

I have a CSV of data and I need to load it into HDFS directories based on a certain field (year). I am planning to use Java. I have looked at using BufferedReader however I am having trouble implementing it. Would this be the optimal thing to use for this task or is there a better way?

1 answers

Use Spark to read the CSV into a dataframe.

use partitionBy("year") during writing to HDFS, and it'll create sub-folders under the path starting with year= for each unique value.

Where are HDFS directories created in Hadoop?

How to mount Hadoop HDFS

hadoop jar error while copying data from mongoDB to hdfs

saving json data in hdfs in hadoop

How does Hadoop get input data not stored on HDFS?

Hadoop read JSON from HDFS

How to load data into dynamodb table from the reducer of a hadoop job

How to run Hadoop HDFS command from java code

How to Serialize object in hadoop (in HDFS)

How to efficiently load data from CSV into Database?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Where are HDFS directories created in Hadoop? How to mount Hadoop HDFS hadoop jar error while copying data from mongoDB to hdfs saving json data in hdfs in hadoop How does Hadoop get input data not stored on HDFS? Hadoop read JSON from HDFS How to load data into dynamodb table from the reducer of a hadoop job How to run Hadoop HDFS command from java code How to Serialize object in hadoop (in HDFS) How to efficiently load data from CSV into Database?

Related Tags

How to Load data from CSV into separate Hadoop HDFS directories based on fields

Question

1 answers

solution1 0 2021-11-04 14:26:08

solution1
0 2021-11-04 14:26:08