[英]Save and Read Key-Value pair in Spark
I have a JavaPairRDD in the following format: 我有以下格式的JavaPairRDD:
JavaPairRDD< String, Tuple2< String, List< String>>> myData;
I want to save it as a Key-Value format (String, Tuple2< String, List< String>>)
. 我想将其另存为键值格式
(String, Tuple2< String, List< String>>)
。
myData.saveAsXXXFile("output-path");
So my next job could read in the data directly to my JavaPairRDD
: 因此,我的下一个工作可以直接将数据读入
JavaPairRDD
:
JavaPairRDD< String, Tuple2< String, List< String>>> newData = context.XXXFile("output-path");
I am using Java 7, Spark 1.2, Java API. 我正在使用Java 7,Spark 1.2,Java API。 I tried
saveAsTextFile
and saveAsObjectFile
, neither works. 我尝试了
saveAsTextFile
和saveAsObjectFile
,都没有用。 And I don't see saveAsSequenceFile
option in my eclipse. 而且我在日食
saveAsSequenceFile
不到saveAsSequenceFile
选项。
Does anyone have any suggestion for this problem? 有人对这个问题有什么建议吗? Thank you very much!
非常感谢你!
You could use SequenceFileRDDFunctions that is used through implicits in scala, however that might be nastier than using the usual suggestion for java of: 您可以使用通过scala中的隐式函数使用的SequenceFileRDDFunctions ,但是这可能比使用Java的通常建议更讨厌:
myData.saveAsHadoopFile(fileName, Text.class, CustomWritable.class,
SequenceFileOutputFormat.class);
implementing CustomWritable
via extending 通过扩展实现
CustomWritable
org.apache.hadoop.io.Writable
Something like this should work (did not check for compilation): 这样的事情应该起作用(不检查编译):
public class MyWritable extends Writable{
private String _1;
private String[] _2;
public MyWritable(Tuple2<String, String[]> data){
_1 = data._1;
_2 = data._2;
}
public Tuple2<String, String[]> get(){
return new Tuple2(_1, _2);
}
@Override
public void readFields(DataInput in) throws IOException {
_1 = WritableUtils.readString(in);
ArrayWritable _2Writable = new ArrayWritable();
_2Writable.readFields(in);
_2 = _2Writable.toStrings();
}
@Override
public void write(DataOutput out) throws IOException {
Text.writeString(out, _1);
ArrayWritable _2Writable = new ArrayWritable(_2);
_2Writable.write(out);
}
}
such that it fits your data model. 使其适合您的数据模型。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.