简体   繁体   English

在Spark中保存和读取键值对

[英]Save and Read Key-Value pair in Spark

I have a JavaPairRDD in the following format: 我有以下格式的JavaPairRDD:

JavaPairRDD< String, Tuple2< String, List< String>>> myData;

I want to save it as a Key-Value format (String, Tuple2< String, List< String>>) . 我想将其另存为键值格式(String, Tuple2< String, List< String>>)

myData.saveAsXXXFile("output-path");

So my next job could read in the data directly to my JavaPairRDD : 因此,我的下一个工作可以直接将数据读入JavaPairRDD

JavaPairRDD< String, Tuple2< String, List< String>>> newData = context.XXXFile("output-path");

I am using Java 7, Spark 1.2, Java API. 我正在使用Java 7,Spark 1.2,Java API。 I tried saveAsTextFile and saveAsObjectFile , neither works. 我尝试了saveAsTextFilesaveAsObjectFile ,都没有用。 And I don't see saveAsSequenceFile option in my eclipse. 而且我在日食saveAsSequenceFile不到saveAsSequenceFile选项。

Does anyone have any suggestion for this problem? 有人对这个问题有什么建议吗? Thank you very much! 非常感谢你!

You could use SequenceFileRDDFunctions that is used through implicits in scala, however that might be nastier than using the usual suggestion for java of: 您可以使用通过scala中的隐式函数使用的SequenceFileRDDFunctions ,但是这可能比使用Java的通常建议更讨厌:

myData.saveAsHadoopFile(fileName, Text.class, CustomWritable.class,
                        SequenceFileOutputFormat.class);

implementing CustomWritable via extending 通过扩展实现CustomWritable

org.apache.hadoop.io.Writable

Something like this should work (did not check for compilation): 这样的事情应该起作用(不检查编译):

public class MyWritable extends Writable{
  private String _1;
  private String[] _2;

  public MyWritable(Tuple2<String, String[]> data){
    _1 = data._1;
    _2 = data._2;
  }

  public Tuple2<String, String[]> get(){
    return new Tuple2(_1, _2);
  }

  @Override
  public void readFields(DataInput in) throws IOException {
    _1 = WritableUtils.readString(in);
    ArrayWritable _2Writable = new ArrayWritable();
    _2Writable.readFields(in);
    _2 = _2Writable.toStrings();
  }

  @Override
  public void write(DataOutput out) throws IOException {
    Text.writeString(out, _1);
    ArrayWritable _2Writable = new ArrayWritable(_2);
    _2Writable.write(out);
  }
}

such that it fits your data model. 使其适合您的数据模型。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM