在Scala中使用带有Spark的Regex过滤DataFrame

Question

I want to filter out rows in Spark DataFrame that have Email column that look like real, here's what I tried: 我想过滤掉Spark DataFrame中具有看起来像真实的电子邮件列的行，这是我尝试过的：

df.filter($"Email" match {case ".*@.*".r => true case _ => false})

But this doesn't work. 但这不起作用。 What is the right way to do it? 做正确的方法是什么？

Answer 1

To expand on @TomTom101's comment, the code you're looking for is: 要扩展@ TomTom101的评论，您要查找的代码是：

df.filter($"Email" rlike ".*@.*")

The primary reason why the match doesn't work is because DataFrame has two filter functions which take either a String or a Column. match不起作用的主要原因是因为DataFrame有两个过滤函数，它们可以是String或Column。 This is unlike RDD with one filter that takes a function from T to Boolean. 这与RDD不同，它有一个过滤器，它将函数从T为布尔值。

在Scala中使用带有Spark的Regex过滤DataFrame

问题描述

1 个解决方案

解决方案1
29 已采纳 2015-11-27 22:05:53

在Scala中使用带有Spark的Regex过滤DataFrame

问题描述

1 个解决方案

解决方案1 29 已采纳 2015-11-27 22:05:53

解决方案1
29 已采纳 2015-11-27 22:05:53