[英]filter DataFrame with Regex with Spark in Scala
I want to filter out rows in Spark DataFrame
that have Email column that look like real, here's what I tried: 我想过滤掉Spark DataFrame
中具有看起来像真实的电子邮件列的行,这是我尝试过的:
df.filter($"Email" match {case ".*@.*".r => true case _ => false})
But this doesn't work. 但这不起作用。 What is the right way to do it? 做正确的方法是什么?
To expand on @TomTom101's comment, the code you're looking for is: 要扩展@ TomTom101的评论,您要查找的代码是:
df.filter($"Email" rlike ".*@.*")
The primary reason why the match
doesn't work is because DataFrame
has two filter functions which take either a String or a Column. match
不起作用的主要原因是因为DataFrame
有两个过滤函数 ,它们可以是String或Column。 This is unlike RDD
with one filter that takes a function from T
to Boolean. 这与RDD
不同,它有一个过滤器 ,它将函数从T
为布尔值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.