I am trying to join two data frame and then apply a like operation on it. But it is not returning any value. I want to do a pattern match here. Any suggestion what i am doing wrong here.
import org.apache.spark._
import org.apache.spark.sql.Row
val upcTable = spark.sqlContext.sparkContext.parallelize(Seq(
Row(1, 50, 100),
Row(2, 60, 200),
Row(36, 70, 300),
Row(45, 80, 400)
))
val lookupUpc = spark.sqlContext.sparkContext.parallelize(Seq(
Row(3, 70, 300),
Row(4, 80, 400)
))
val upcDf = spark.sqlContext.createDataFrame(upcTable, StructType(Seq(
StructField("U_ID", StringType, nullable = false),
StructField("V_ID", IntegerType, nullable = false),
StructField("R_ID", IntegerType, nullable = false))))
val lookupDf = spark.sqlContext.createDataFrame(lookupUpc, StructType(Seq(
StructField("U_ID", StringType, nullable = false),
StructField("V_ID", IntegerType, nullable = false))))
lookupDf.show()
val joinDf = upcDf.join(lookupDf,Seq("V_ID"),"inner").filter(upcDf("U_ID").like("%lookupDf(U_ID)")).select(upcDf("U_ID"),upcDf("V_ID"),upcDf("R_ID")).show()
Here I wanted 36 and 45 from the upcDf.
Rather than column method like which expects a literal String
, method contains which takes an argument of type Any
(hence also Column
) would be more suitable in your case:
val joinDf = upcDf.join(lookupDf, Seq("V_ID"), "inner").
where(upcDf("U_ID").contains(lookupDf("U_ID"))).
select(upcDf("U_ID"), upcDf("V_ID"), upcDf("R_ID"))
joinDf.show
// +----+----+----+
// |U_ID|V_ID|R_ID|
// +----+----+----+
// | 45| 80| 400|
// | 36| 70| 300|
// +----+----+----+
Note that column U_ID
in your sample dataset should be of String
type based on the listed schemas.
[UPDATE]
As per clarified requirement from comments, if you want to limit the match to only the leading character I would suggest using method regexp_extract and replace the above where
clause with the following:
where(lookupDf("U_ID") === regexp_extract(upcDf("U_ID"), "^(.)", 1))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.