简体   繁体   English

加入数据框时触发正则表达式

[英]spark regex while join data frame

I need to write some regex for condition check in spark while doing some join, 我需要在进行一些连接时编写一些正则表达式以进行状态检查,

My regex should match below string 我的正则表达式应匹配以下字符串

n3_testindia1 = test-india-1
n2_stagamerica2 = stag-america-2
n1_prodeurope2 = prod-europe-2

df1.select("location1").distinct.show() df1.select(“ location1”)。distinct.show()

+----------------+
|    location1   |
+----------------+
|n3_testindia1   |
|n2_stagamerica2 |
|n1_prodeurope2  |

df2.select("loc1").distinct.show() df2.select(“ loc1”)。distinct.show()

+--------------+
|      loc1    |
+--------------+
|test-india-1  |   
|stag-america-2|
|prod-europe-2 |
+--------------+

I want to join based on location columns like below 我想根据以下位置列加入

val joindf = df1.join(df2, df1("location1") == regex(df2("loc1")))

Based on the information above you can do that in Spark 2.4.0 using 根据以上信息,您可以在Spark 2.4.0中使用

val joindf = df1.join(df2, 
  regexp_extract(df1("location1"), """[^_]+_(.*)""", 1) 
    === translate(df2("loc1"), "-", ""))

Or in prior versions something like 或在以前的版本中

val joindf = df1.join(df2, 
  df1("location1").substr(lit(4), length(df1("location1")))
    === translate(df2("loc1"), "-", ""))

You can split by "_" in location1 and take the 2 element, then match with the entire string of "-" removed string in loc1. 您可以在location1中用“ _”分割并采用2个元素,然后与loc1中“-”已删除字符串的整个字符串匹配。 Check this out: 看一下这个:

scala> val df1 = Seq(("n3_testindia1"),("n2_stagamerica2"),("n1_prodeurope2")).toDF("location1")
df1: org.apache.spark.sql.DataFrame = [location1: string]

scala> val df2 = Seq(("test-india-1"),("stag-america-2"),("prod-europe-2")).toDF("loc1")
df2: org.apache.spark.sql.DataFrame = [loc1: string]

scala> df1.join(df2,split('location1,"_")(1) === regexp_replace('loc1,"-",""),"inner").show
+---------------+--------------+
|      location1|          loc1|
+---------------+--------------+
|  n3_testindia1|  test-india-1|
|n2_stagamerica2|stag-america-2|
| n1_prodeurope2| prod-europe-2|
+---------------+--------------+


scala>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM