在字符串上过滤 spark DataFrame 包含

Question

I am using Spark 1.3.0 and Spark Avro 1.0.0 .我正在使用Spark 1.3.0和Spark Avro 1.0.0 。 I am working from the example on the repository page .我正在使用存储库页面上的示例。 This following code works well以下代码运行良好

val df = sqlContext.read.avro("src/test/resources/episodes.avro")
df.filter("doctor > 5").write.avro("/tmp/output")

But what if I needed to see if the doctor string contains a substring?但是如果我需要查看doctor字符串是否包含子字符串呢？ Since we are writing our expression inside of a string.因为我们在字符串中编写我们的表达式。 What do I do to do a "contains"?我该怎么做才能做到“包含”？

Answer 1

You can use contains (this works with an arbitrary sequence):您可以使用contains （这适用于任意序列）：

df.filter($"foo".contains("bar"))

like (SQL like with SQL simple regular expression whith _ matching an arbitrary character and % matching an arbitrary sequence): like (SQL like with SQL simple正则表达式_匹配任意字符和%匹配任意序列):

df.filter($"foo".like("bar"))

or rlike (like with Java regular expressions ):或rlike （如Java 正则表达式）：

df.filter($"foo".rlike("bar"))

depending on your requirements.取决于您的要求。 LIKE and RLIKE should work with SQL expressions as well. LIKE和RLIKE也适用于 SQL 表达式。

Answer 2

In pyspark,SparkSql syntax:在pyspark中，SparkSql语法：

where column_n like 'xyz%'

might not work.可能不起作用。

Use:用：

where column_n RLIKE '^xyz'

This works perfectly fine.这工作得很好。

在字符串上过滤 spark DataFrame 包含

问题描述

2 个解决方案

解决方案1
86 已采纳 2016-03-02 22:21:26

解决方案2
1 2019-09-02 14:52:47

在字符串上过滤 spark DataFrame 包含

问题描述

2 个解决方案

解决方案1 86 已采纳 2016-03-02 22:21:26

解决方案2 1 2019-09-02 14:52:47

解决方案1
86 已采纳 2016-03-02 22:21:26

解决方案2
1 2019-09-02 14:52:47