如何在Hive中将CSV文件与表连接

Question

我在join Spark时遇到问题。 我已经从一些CSV中加载了数据，并且希望将它们加入到Hive中的表中。

我已尝试根据文档进行此操作，但没有成功

我将表定义为

Dataset<Row> table = SparkSession.sql(query);

我想加入

Dataset<Row> data = SparkSession
    .read()
    .format("csv")
    .option("header", true)
    .option("inferSchema", true)
    .load(path1, path2)

我已经尝试过了

data.join(table, data.col("id1").equalTo(table.col("id2")), "left")

Answer 1

你应该尝试joinWith

data.joinWith(table, data.col("id1").equalTo(table.col("id2"), "left")

参考： https : //jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-joins.html

编辑：

代替left使用left_outer，left不是joinType，并且There is absolutely no difference between LEFT JOIN and LEFT OUTER JOIN

data.join(table, data.col("id1").equalTo(table.col("id2")), "left_outer")

参考： https : //spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html

join

public Dataset<Row> join(Dataset<?> right,
                scala.collection.Seq<String> usingColumns,
                String joinType)
Equi-join with another DataFrame using the given columns.
Different from other join functions, the join columns will only appear once in the output, i.e. similar to SQL's JOIN USING syntax.

Parameters:
right - Right side of the join operation.
usingColumns - Names of the columns to join on. This columns must exist on both sides.
joinType - One of: inner, outer, left_outer, right_outer, leftsemi.

Answer 2

好吧，我得到了答案。 模式的问题在于，当您要在spark中使用csv时，您需要定义模式，当您连接表时，即使您不想将此字段另存为输出，也需要在已连接的模式键中定义否则它将无法正常工作

如何在Hive中将CSV文件与表连接

问题描述

2 个解决方案

解决方案1
0 2019-04-09 10:15:08

解决方案2
0 2019-04-09 11:48:44

如何在Hive中将CSV文件与表连接

问题描述

2 个解决方案

解决方案1 0 2019-04-09 10:15:08

解决方案2 0 2019-04-09 11:48:44

解决方案1
0 2019-04-09 10:15:08

解决方案2
0 2019-04-09 11:48:44