简体   繁体   中英

PySpark DataFrame - Create a column from another dataframe

I'm working in a Python 3 notebook in Azure Databricks with Spark 3.0.1.

I have the following DataFrame

+---+---------+
|ID |Name     |
+---+---------+
|1  |John     |
|2  |Michael  |
+---+---------+

Which can be created with this code

from pyspark.sql.types import StructType,StructField, StringType, IntegerType

data2 = [(1,"John","Doe"),
    (2,"Michael","Douglas")
  ]

schema = StructType([ \
    StructField("ID",IntegerType(),True), \
    StructField("Name",StringType(),True), \
  ])
 
df1 = spark.createDataFrame(data=data2,schema=schema)
df1.show(truncate=False)

I am trying to transform it into an object which can be serialized into json with a single property called Entities which is an array of the elements in the DataFrame.

Like this

{
    "Entities": [
        {
            "ID": 1,
            "Name": "John"
        },
        {
            "ID": 2,
            "Name": "Michael"
        }
    ]
}

I've been trying to figure out how to do it but haven't had any luck so far. Can anyone point me in the right direction please?

try this:

from pyspark.sql.types import StructType,StructField, StringType, IntegerType
from pyspark.sql import functions as F

data2 = [
    (1,"John","Doe"),
    (2,"Michael","Douglas")
]
schema = StructType([ 
    StructField("id",IntegerType(),True), 
    StructField("fname",StringType(),True), 
    StructField("lname",StringType(),True), 
  ])
df1 = spark.createDataFrame(data2, schema)

df = (
    df1
    .withColumn("profile", F.struct("id", "fname"))
    .groupby()
    .agg(F.collect_list("profile").alias("Entities"))  
)
df.select("Entities").coalesce(1).write.format('json').save('test', mode="overwrite")

Output file:

{
    "Entities": [{
        "id": 1,
        "fname": "John"
    }, {
        "id": 2,
        "fname": "Michael"
    }]
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM