Apache Spark：使用array_contains找不到符号错误

Question

I am new to apache spark and writing application that parses through a json file. 我是apache spark和编写通过json文件解析的应用程序的新手。 One of the attributes in the json file is an array of strings. json文件中的一个属性是字符串数组。 I want to run a query that selects a row if the array attribute does not contain the string "None". 我想运行一个查询，如果数组属性不包含字符串“None”，则选择一行。 I found some solutions that use the array_contains method in the org.apache.spark.sql.functions package. 我找到了一些在org.apache.spark.sql.functions包中使用array_contains方法的解决方案。 However when I attempt to build my application I get the following cannot find symbol error: 但是当我尝试构建我的应用程序时，我得到以下内容无法找到符号错误：

I am using Apache Spark 2.0, and maven to build my project. 我正在使用Apache Spark 2.0和maven来构建我的项目。 The code that I am attempting to compile: 我试图编译的代码：

import java.util.List;

import scala.Tuple2;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.Column;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SQLContext;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.functions;

import static org.apache.spark.sql.functions.col;
public class temp {

    public static void main(String[] args) {
        SparkSession spark = SparkSession
                .builder()
                .appName("testSpark")
                .enableHiveSupport()    
                .getOrCreate();

        Dataset<Row> df = spark.read().json("hdfs://XXXXXX.XXX:XXX/project/term/project.json");
        df.printSchema();
        Dataset<Row> filteredDF = df.select(col("user_id"),col("elite"));
        df.createOrReplaceTempView("usersTable");
        String val[] = {"None"};
        Dataset<Row> newDF = df.select(col("user_id"),col("elite").where(array_contains(col("elite"),"None")));
        newDF.show(10);
        JavaRDD<Row> users = filteredDF.toJavaRDD();
    }
}

Below is my pom.xml file: 下面是我的pom.xml文件：

<project>
  <modelVersion>4.0.0</modelVersion>
  <groupId>term</groupId>
  <artifactId>Data</artifactId>
  <version>0.0.1</version>

  <!-- specify java version needed??? -->
  <properties>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
  </properties>

  <!-- overriding src/java directory... -->
  <build>
    <sourceDirectory>src/</sourceDirectory>
  </build>

  <!-- telling it to create a jar -->
  <packaging>jar</packaging>


  <!-- DEPENDENCIES -->
  <dependencies>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-client</artifactId>
      <version>2.7.3</version>
    </dependency>
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>2.11.7</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming_2.10</artifactId>
      <version>2.0.0</version>
    </dependency>
  </dependencies>
</project>

Answer 1

Change 更改

Dataset<Row> newDF = df.select(col("user_id"),col("elite").where(
        array_contains(col("elite"),"None")));

to add the functions class as a qualifier, like 将functions类添加为限定符，如

Dataset<Row> newDF = df.select(col("user_id"),col("elite").where(
        functions.array_contains(col("elite"),"None")));

or , use a Static Import 或者，使用静态导入

import static org.apache.spark.sql.functions.array_contains;

Apache Spark：使用array_contains找不到符号错误

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-04-20 04:22:53

Apache Spark：使用array_contains找不到符号错误

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-04-20 04:22:53

解决方案1
3 已采纳 2017-04-20 04:22:53