简体   繁体   English

在Scala中使用带有Spark的Regex过滤DataFrame

[英]filter DataFrame with Regex with Spark in Scala

I want to filter out rows in Spark DataFrame that have Email column that look like real, here's what I tried: 我想过滤掉Spark DataFrame中具有看起来像真实的电子邮件列的行,这是我尝试过的:

df.filter($"Email" match {case ".*@.*".r => true case _ => false})

But this doesn't work. 但这不起作用。 What is the right way to do it? 做正确的方法是什么?

To expand on @TomTom101's comment, the code you're looking for is: 要扩展@ TomTom101的评论,您要查找的代码是:

df.filter($"Email" rlike ".*@.*")

The primary reason why the match doesn't work is because DataFrame has two filter functions which take either a String or a Column. match不起作用的主要原因是因为DataFrame有两个过滤函数 ,它们可以是String或Column。 This is unlike RDD with one filter that takes a function from T to Boolean. 这与RDD不同,它有一个过滤器 ,它将函数从T为布尔值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM