简体   繁体   中英

Using Text data type in JavaRDD and returning void in FlatMap

I am trying to migrate a hadoop code into spark. I already have some predefined functions which I should be able to reuse in spark, as they are mere java codes, without much of hadoop dependency. I have a function that accepts input (spatial data-longitude, latitude) in Text format and converts them into shape (Polygons, linestream etc). When I try to read it in Spark, I am reading each line of the files first as String. Then converting them to Text so that I can use my previously created function. But I have two doubts, firstly it seems like JavaRDD doesn't use Text and I am getting some problems for that. Secondly the function that converts Text to shape doesn't return anything. But I am not being able to use flatMap or any other mapping technique. I am not even sure if my approach is correct or not.

Here is my code model:

/*function for converting Text to Shape*/   
public interface TextSerializable {
public Text toText(Text text);
public void fromText(Text text);
* Retrieve information from the given text.
* @param text The text to parse
*/
}



/*Shape Class looks something like this*/

public interface Shape extends Writable, Cloneable, TextSerializable {
/
* Returns minimum bounding rectangle for this shape.
* @return The minimum bounding rectangle for this shape
*/
public Rectangle getMBR();

/**
* Gets the distance of this shape to the given point.
* @param x The x-coordinate of the point to compute the distance to
* @param y The y-coordinate of the point to compute the distance to
* @return The Euclidean distance between this object and the given point
*/
......
......
......*/

/*My code structure*/

 SparkConf conf = new SparkConf().setAppName("XYZ").setMaster("local");
 JavaSparkContext sc =new JavaSparkContext(conf);

 final Text text=new Text();

 JavaRDD<String> lines = sc.textFile("ABC.csv");

 lines.foreach(new VoidFunction<String>(){
 public void call(String lines){
        text.set(lines);
        System.out.println(text);
    }
    });

/*Problem*/
text.flatMap(new FlatMapFunction<Text>(){
    public Iterable<Shape> call(Shape s){
        s.fromText(text);
        //return void;
    }

The last line of the code is wrong, but I don't know how to fix it. JavaRDD can be used with user defined class (as per my knowledge). I am not even sure if the way I have converted the String lines to Text text, if that is allowed in the RDD or not. I am completely new in Spark. Any kind of help would be great.

You are totally off from the concept. First thing you cannot call functions like map, flatmap etc. on any object they can be called only from JavaRDD and Text is not a JavaRDD and Spark do support Text but not in the way you used it.

Now coming to your question since you want to convert string to text format use something like this

   SparkConf conf = new SparkConf().setAppName("Name of Application");
    JavaSparkContext sc = new JavaSparkContext(conf);
    JavaRDD<String> logData = sc.textFile("replace with address of file");

/*This map function will take string as input because we are calling it on javaRDD logData and that logData return string type value. This map fucntion will give Text as output 
you can replace the return statement with logic of your toText function(However new Text(s) is also a way to convert string into Text) but remember use of return is mandatory so apply logic accordingly 
         */
        JavaRDD<Text> rddone = logData.map(new Function<String,Text>(){
            public Text call(String s)
            {// type logic of your toText() function here
             return  new Text(s);}});  

Now when we call our flatmap function over JavaRDD rddone it will take input as Text since the output of rddone is Text and it can give output whatever you want.

/* This flatmap fucntion will take Text as input and will give iterator over object */
    JavaRDD <Object> empty = rddone.flatMap(new FlatMapFunction<Text,Object>(){
            public Iterator<Object> call(Text te)
            {
                // here you can call your fromText(te) method.
                return null;
        }
        }); 

also refer these links for more details http://spark.apache.org/docs/latest/programming-guide.html

http://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/api/java/JavaRDD.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM