![](/img/trans.png)
[英]SQL query to copy column from one table to another table based off index value
[英]With PySpark dataframe locate value from one array based on index and copy to another array
在這個 dataframe 中,我有以下兩個 arrays:discount_applicaitons 和 line_items。 line_items 數組有一個名為 discount_allocaitons 的內部數組,其中有一個名為 discount_application_index 的字段。 要求是使用 discount_application_index 值並在 discount_applications 數組索引中找到相應的“type”值並將其復制到相應的 applications_type 字段中。
這是 dataframe:
records = '[{"_c":{"discount_applications":[{"type":"manual0"},{"type":"manual1"},{"type":"manual2"},{"type":"manual3"}],"line_items":[{"discount_allocations":[{"application_type":"","discount_application_index":0}]},{"discount_allocations":[{"application_type":"","discount_application_index":1}]},{"discount_allocations":[{"application_type":"","discount_application_index":2}]},{"discount_allocations":[{"application_type":"","discount_application_index":3}]}]}},{"_c":{"discount_applications":[{"type":"manual0"},{"type":"manual1"},{"type":"manual2"}],"line_items":[{"discount_allocations":[{"application_type":"","discount_application_index":0}]},{"discount_allocations":[{"application_type":"","discount_application_index":1}]},{"discount_allocations":[{"application_type":"","discount_application_index":2}]}]}},{"_c":{"discount_applications":[{"type":"manual0"},{"type":"manual1"},{"type":"manual2"}],"line_items":[{"discount_allocations":[{"application_type":"","discount_application_index":0}]},{"discount_allocations":[{"application_type":"","discount_application_index":1}]},{"discount_allocations":[{"application_type":"","discount_application_index":2}]}]}}]'
df = spark.read.json(sc.parallelize([records]))
df.show(truncate=False)
df.printSchema()
root
|-- _c: struct (nullable = true)
| |-- discount_applications: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- type: string (nullable = true)
| |-- line_items: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- discount_allocations: array (nullable = true)
| | | | |-- element: struct (containsNull = true)
| | | | | |-- application_type: string (nullable = true)
| | | | | |-- discount_application_index: long (nullable = true)
+--------------------------------------------------------------------------------------------+
|_c |
+--------------------------------------------------------------------------------------------+
|[[[manual0], [manual1], [manual2], [manual3]], [[[[, 0]]], [[[, 1]]], [[[, 2]]], [[[, 3]]]]]|
|[[[manual0], [manual1], [manual2]], [[[[, 0]]], [[[, 1]]], [[[, 2]]]]] |
|[[[manual0], [manual1], [manual2]], [[[[, 0]]], [[[, 1]]], [[[, 2]]]]] |
+--------------------------------------------------------------------------------------------+
轉換后,要求 dataframe 看起來像這樣:
+------------------------------------------------------------------------------------------------------------------------+
|_c |
+------------------------------------------------------------------------------------------------------------------------+
|[[[manual0], [manual1], [manual2], [manual3]], [[[[manual0, 0]]], [[[manual1, 1]]], [[[manual2, 2]]], [[[manual3, 3]]]]]|
|[[[manual0], [manual1], [manual2]], [[[[manual0, 0]]], [[[manual1, 1]]], [[[manual2, 2]]]]] |
|[[[manual0], [manual1], [manual2]], [[[[manual0, 0]]], [[[manual1, 1]]], [[[manual2, 2]]]]] |
+------------------------------------------------------------------------------------------------------------------------+
只是讓你的頭腦清醒並進行transform
:)
import pyspark.sql.functions as F
df2 = df.withColumn(
'_c',
F.expr("""
struct(
_c.discount_applications,
transform(
_c.line_items,
x -> struct(
transform(
x.discount_allocations,
y -> struct(
_c.discount_applications[int(y.discount_application_index)].type as application_type,
y.discount_application_index as discount_application_index
)
) as discount_allocations
)
) as line_items
)
""")
)
df2.show(truncate=False)
+------------------------------------------------------------------------------------------------------------------------+
|_c |
+------------------------------------------------------------------------------------------------------------------------+
|[[[manual0], [manual1], [manual2], [manual3]], [[[[manual0, 0]]], [[[manual1, 1]]], [[[manual2, 2]]], [[[manual3, 3]]]]]|
|[[[manual0], [manual1], [manual2]], [[[[manual0, 0]]], [[[manual1, 1]]], [[[manual2, 2]]]]] |
|[[[manual0], [manual1], [manual2]], [[[[manual0, 0]]], [[[manual1, 1]]], [[[manual2, 2]]]]] |
+------------------------------------------------------------------------------------------------------------------------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.