簡體   English   中英

在 Pyspark 中的特定 position 之后獲取 position 的 substring

[英]Get position of substring after a specific position in Pyspark

我有一張這樣的桌子:

+-----+-----------------------+
| id  |                 word  |
+---+-------------------------+
|  1  |  today is a nice day  |
|  2  |          hello world  |
|  3  |           he is good  |
|  4  |       is it raining?  |
+-----+-----------------------+

我想獲得 position 的word ( is ) 僅當它出現在第三個 position 之后

+-----+-----------------------+-----------------+
| id  |                 word  |  substr_position|
+---+-------------------------+-----------------+
|  1  |  today is a nice day  |              7  |
|  2  |          hello world  |              0  |
|  3  |           he is good  |              4  |
|  4  |       is it raining?  |              0  |
+-----+-----------------------+-----------------+

有什么幫助嗎?

您可以在 spark 中使用定位function。
它在特定 position 之后返回字符串列中第一次出現的 substring。

from pyspark.sql.functions import locate, col
df.withColumn("substr_position", locate("is", col("word"), pos=3)).show()

+---+-------------------+---------------+
| id|               word|substr_position|
+---+-------------------+---------------+
|  1|today is a nice day|              7|
|  2|        hello world|              0|
|  3|         he is good|              4|
|  4|     is it raining?|              0|
+---+-------------------+---------------+

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM