[英]Java Program to load the multiple csv files from HDFS, read column values and load into HBASE?
我是大數據和 hadoop 的新手。
我有一個要求,我必須上傳 100 個 csv 文件,其中包含信息(例如人員信息,即姓名、年齡、城市)到 hdfs
然后使用 java 程序從 HDFS 加載 csv 文件,讀取列值並加載到 HBASE。
你能幫我么?
解析文件很好。 但我不明白如何使用 java 將多個 csv 文件從 hdfs 加載到 hbase。
我假設你在 HDFS 上有你的 csv 文件。 因此,要使用 java 讀取文件,您需要:
從 HDFS 讀取文本文件的代碼可能是這樣的:
Configuration conf = new Configuration();
conf.addResource(new Path("/etc/hadoop/conf/core-site.xml"));
conf.addResource(new Path("/etc/hadoop/conf/hdfs-site.xml"));
Path file = new Path("/user/username/people-file.csv");
FileSystem hdfs = FileSystem.get(conf);
FSDataInputStream is = hdfs.open(file);
BufferedReader br = new BufferedReader( new InputStreamReader( is, "UTF-8" ) );
String lineRead = br.readLine();
while(lineRead != null) {
System.out.println(lineRead);
lineRead = br.readLine();
//do what ever you need with the line of data, map it into object, add into collection, e.t.c...
}
br.close();
hdfs.close();
然后,當您在 memory 中有數據時,您可以將其保存到 HBASE 中。 因此,要將數據保存到 HBASE 中,您需要:
將數據插入 HBASE 的代碼可能是這樣的:
Configuration config = HBaseConfiguration.create();
config.addResource(new Path("/etc/hbase/conf/hbase-site.xml"));
config.addResource(new Path("/etc/hadoop/conf/core-site.xml"));
String tableName = "people";
Connection connection = ConnectionFactory.createConnection(config);
Table table = connection.getTable(TableName.valueOf(tableName));
//you might have similar array after reading the csv files
String[][] people = {
{ "1", "Marcel", "Haddad", "marcel@xyz.com", "M", "26" },
{ "2", "Franklin", "Holtz", "franklin@xyz.com", "M", "24" },
{ "3", "Dwayne", "McKee", "dwayne@xyz.com", "M", "27" },
{ "4", "Rae", "Schroeder", "rae@xyz.com", "F", "31" },
{ "5", "Rosalie", "burton", "rosalie@xyz.com", "F", "25" },
{ "6", "Gabriela", "Ingram", "gabriela@xyz.com", "F", "24" } };
for (int i = 0; i < people.length; i++) {
Put person = new Put(Bytes.toBytes(people[i][0]));
person.addColumn(Bytes.toBytes("name"), Bytes.toBytes("first"), Bytes.toBytes(people[i][1]));
person.addColumn(Bytes.toBytes("name"), Bytes.toBytes("last"), Bytes.toBytes(people[i][2]));
person.addColumn(Bytes.toBytes("contact_info"), Bytes.toBytes("email"), Bytes.toBytes(people[i][3]));
person.addColumn(Bytes.toBytes("personal_info"), Bytes.toBytes("gender"), Bytes.toBytes(people[i][4]));
person.addColumn(Bytes.toBytes("personal_info"), Bytes.toBytes("age"), Bytes.toBytes(people[i][5]));
table.put(person);
}
table.close();
connection.close()
請注意,在開始插入數據之前,您需要在 HBASE 中創建表(模式),如下所示: https://www.tutorialspoint.com/hbase/hbase_create_table.htm
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.