![](/img/trans.png)
[英]How can use Python to mark words in a sentence string depending on whether they come after one specific word and before a full stop?
[英]How to use pyspark to find whether a column contains one or more words in it's string sentence
這對您來說可能是一個array_contains()
解決方案 - 使用高階函數array_contains()
而不是循環遍歷每個項目,但是為了實現該解決方案,我們需要稍微簡化一下。 例如需要將字符串列作為數組
from pyspark.sql import functions as F
from pyspark.sql import types as T
df = spark.createDataFrame([(1,"This is a Horse"),(2,"Monkey Loves trees"),(3,"House has a tree"),(4,"The Ocean is Cold")],[ "col1","col2"])
df.show(truncate=False)
+----+-----------------+
|col1|col2 |
+----+-----------------+
|1 |This is a Horse |
|2 |Monkey Loves trees|
|3 |House has a tree |
|4 |The Ocean is Cold|
+----+-----------------+
df = df.withColumn("col2", F.split("col2", " "))
df = df.withColumn("array_filter", F.when(F.array_contains("col2", "This"), True).when(F.array_contains("col2", "tree"), True))
df = df.filter(F.col("array_filter") == True)
df.show(truncate=False)
+----+---------------------+------------+
|col1|col2 |array_filter|
+----+---------------------+------------+
|1 |[This, is, a, Horse] |true |
|3 |[House, has, a, tree]|true |
+----+---------------------+------------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.