Java - Spark SQL DataFrame map function is not working

Question

In Spark SQL when I tried to use map function on DataFrame then I am getting below error.

The method map(Function1, ClassTag) in the type DataFrame is not applicable for the arguments (new Function(){})

I am following spark 1.3 documentation as well. https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection Have any one solution?

Here is my testing code.

   // SQL can be run over RDDs that have been registered as tables.
DataFrame teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19");

List<String> teenagerNames = teenagers.map(
            new Function<Row, String>() {
          public String call(Row row) {
            return "Name: " + row.getString(0);
          }
        }).collect();

Answer 1

Change this to:

Java 6 & 7

List<String> teenagerNames = teenagers.javaRDD().map(
    new Function<Row, String>() {
    public String call(Row row) {
        return "Name: " + row.getString(0);
    }
}).collect();

Java 8

List<String> t2 = teenagers.javaRDD().map(
    row -> "Name: " + row.getString(0)
).collect();

Once you call javaRDD() it works just like any other RDD map function.

This works with Spark 1.3.0 and up.

Answer 2

No need to convert to RDD, its delays the execution it can be done as below

`public static void mapMethod() { // Read the data from file, where the file is in the classpath. Dataset df = sparkSession.read().json("file1.json");

// Prior to java 1.8 
Encoder<String> encoder = Encoders.STRING();
    List<String> rowsList = df.map((new MapFunction<Row, String>() {
        private static final long serialVersionUID = 1L;

        @Override
        public String call(Row row) throws Exception {
            return "string:>" + row.getString(0).toString() + "<";
        }
    }), encoder).collectAsList();

// from java 1.8 onwards
List<String> rowsList1 = df.map((row -> "string >" + row.getString(0) + "<" ), encoder).collectAsList();
System.out.println(">>> " + rowsList);
System.out.println(">>> " + rowsList1);

}`

Answer 3

Do you have the correct dependency set in your pom. Set this and try

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.10</artifactId>
        <version>1.3.1</version>
    </dependency>

Answer 4

try this:

// SQL can be run over RDDs that have been registered as tables.
DataFrame teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19");

List<String> teenagerNames = teenagers.toJavaRDD().map(
        new Function<Row, String>() {
      public String call(Row row) {
        return "Name: " + row.getString(0);
      }
    }).collect();

you have to transforme your DataFrame to javaRDD

Answer 5

check if you are using the correct import for

Row(import org.apache.spark.sql.Row) Remove any other imports related to Row.otherwise ur syntax is correct

Answer 6

Please check your input file's data and your dataframe sql query same thing I am facing and when I look back to the data so it was not matching with my query. So probably same issue you are facing. toJavaRDD and JavaRDD both are working.

Java - Spark SQL DataFrame map function is not working

Question

6 answers

solution1
12 2015-05-05 17:44:22

solution2
1 2017-12-23 03:55:38

solution3
0 2015-04-22 13:38:55

solution4
0 2015-06-04 10:01:12

solution5
0 2016-02-04 11:02:47

solution6
0 2016-04-12 06:45:34

Java - Spark SQL DataFrame map function is not working

Question

6 answers

solution1 12 2015-05-05 17:44:22

solution2 1 2017-12-23 03:55:38

solution3 0 2015-04-22 13:38:55

solution4 0 2015-06-04 10:01:12

solution5 0 2016-02-04 11:02:47

solution6 0 2016-04-12 06:45:34

solution1
12 2015-05-05 17:44:22

solution2
1 2017-12-23 03:55:38

solution3
0 2015-04-22 13:38:55

solution4
0 2015-06-04 10:01:12

solution5
0 2016-02-04 11:02:47

solution6
0 2016-04-12 06:45:34