I have a table like this:
+-----+-----------------------+
| id | word |
+---+-------------------------+
| 1 | today is a nice day |
| 2 | hello world |
| 3 | he is good |
| 4 | is it raining? |
+-----+-----------------------+
I want to get the position of a substring ( is
) in the word
column only if it occurs after the 3rd position
+-----+-----------------------+-----------------+
| id | word | substr_position|
+---+-------------------------+-----------------+
| 1 | today is a nice day | 7 |
| 2 | hello world | 0 |
| 3 | he is good | 4 |
| 4 | is it raining? | 0 |
+-----+-----------------------+-----------------+
Any help?
You can use the locate function in spark.
It returns the first occurrence of a substring in a string column, after a specific position.
from pyspark.sql.functions import locate, col
df.withColumn("substr_position", locate("is", col("word"), pos=3)).show()
+---+-------------------+---------------+
| id| word|substr_position|
+---+-------------------+---------------+
| 1|today is a nice day| 7|
| 2| hello world| 0|
| 3| he is good| 4|
| 4| is it raining?| 0|
+---+-------------------+---------------+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.