[英]Skipping the first line of the .csv in Map reduce java
As mapper function runs for every line , can i know the way how to skip the first line. 由于mapper函数在每一行上运行,我可以知道如何跳过第一行。 For some file it consists of column header which i want to ignore 对于某些文件,它包含我想忽略的列标题
In mapper while reading the file, the data is read in as key-value pair. 在读取文件的映射器中,数据作为键值对读取。 The key is the byte offset where the next line starts. 关键是下一行开始处的字节偏移量。 For line 1 it is always zero. 对于第1行,它始终为零。 So in mapper function do the following 因此,在mapper函数中执行以下操作
@Override
public void map(LongWritable key, Text value, Context context) throws IOException {
try {
if (key.get() == 0 && value.toString().contains("header") /*Some condition satisfying it is header*/)
return;
else {
// For rest of data it goes here
}
} catch (Exception e) {
e.printStackTrace();
}
}
As the file can be stored in multiple nodes, we cant say in which machine the header part present and which mapper is processing that part of file. 由于文件可以存储在多个节点中,因此我们无法说出头文件存在于哪台机器中以及哪个映射器正在处理文件的那部分。 We can filter out the header in the Mapper itself.For this you have to know the headers. 我们可以在Mapper本身中过滤出标头,为此您必须知道标头。 For example 例如
String[] cols= line.tokenize();
if(cols[0].equals("header")) {
// skip
} else {
// emit
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.