![](/img/trans.png)
[英]IF-statement applied to substring of a string in a specific position
[英]Get position of substring after a specific position in Pyspark
我有一张这样的桌子:
+-----+-----------------------+
| id | word |
+---+-------------------------+
| 1 | today is a nice day |
| 2 | hello world |
| 3 | he is good |
| 4 | is it raining? |
+-----+-----------------------+
我想获得 position 的word
( is
) 仅当它出现在第三个 position 之后
+-----+-----------------------+-----------------+
| id | word | substr_position|
+---+-------------------------+-----------------+
| 1 | today is a nice day | 7 |
| 2 | hello world | 0 |
| 3 | he is good | 4 |
| 4 | is it raining? | 0 |
+-----+-----------------------+-----------------+
有什么帮助吗?
您可以在 spark 中使用定位function。
它在特定 position 之后返回字符串列中第一次出现的 substring。
from pyspark.sql.functions import locate, col
df.withColumn("substr_position", locate("is", col("word"), pos=3)).show()
+---+-------------------+---------------+
| id| word|substr_position|
+---+-------------------+---------------+
| 1|today is a nice day| 7|
| 2| hello world| 0|
| 3| he is good| 4|
| 4| is it raining?| 0|
+---+-------------------+---------------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.