[英]How to split a spark dataframe column string?
我有一個看起來像這樣的數據框:
|--------------------------------------|---------|---------|
| path | content|
|------------------------------------------------|---------|
| /root/path/main_folder1/folder1/path1.txt | Val 1 |
|------------------------------------------------|---------|
| /root/path/main_folder1/folder2/path2.txt | Val 1 |
|------------------------------------------------|---------|
| /root/path/main_folder1/folder2/path3.txt | Val 1 |
|------------------------------------------------|---------|
我想用“/”分割路徑中的列值,並只獲取值直到 /root/path/mainfolder1 我想要的輸出是
|--------------------------------------|---------|---------|---------------------------|
| path | content| root_path |
|------------------------------------------------|---------|---------------------------|
| /root/path/main_folder1/folder1/path1.txt | Val 1 | /root/path/main_folder1 |
|------------------------------------------------|---------|---------------------------|
| /root/path/main_folder1/folder2/path2.txt | Val 1 | /root/path/main_folder1 |
|------------------------------------------------|---------|---------------------------|
| /root/path/main_folder1/folder2/path3.txt | Val 1 | /root/path/main_folder1 |
|------------------------------------------------|---------|---------------------------|
我知道我必須使用 withColumn split 和 regexp_extract 但我並沒有安靜地了解如何限制 regexp_extract 的輸出。
我必須做什么才能獲得所需的輸出?
您可以使用正則表達式來提取前三個目錄級別。
df.withColumn("root_path", F.regexp_extract(F.col("path"), "^((/\w*){3})",1))\
.show(truncate=False)
輸出:
+-----------------------------------------+-------+-----------------------+
|path |content|root_path |
+-----------------------------------------+-------+-----------------------+
|/root/path/main_folder1/folder1/path1.txt|val 1 |/root/path/main_folder1|
|/root/path/main_folder1/folder2/path2.txt|val 2 |/root/path/main_folder1|
|/root/path/main_folder1/folder2/path3.txt|val 3 |/root/path/main_folder1|
+-----------------------------------------+-------+-----------------------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.