简体   繁体   中英

Get field values from a structtype in pyspark dataframe

I have to get the schema from a csv file (the column name and datatype).I have reached so far -

l = [('Alice', 1)]
Person = Row('name', 'age')
rdd = sc.parallelize(l)
person = rdd.map(lambda r: Person(*r))
df2 = spark.createDataFrame(person)
print(df2.schema)
#StructType(List(StructField(name,StringType,true),StructField(age,LongType,true)))

I want to extract the values name and age along with StringType and LongType however I don't see any method on struct type.

There's toDDL method of struct type in scala but the same is not available for python.

This is an extension of the mentioned question where I already got help , however I wanted to create a new thread - Get dataframe schema load to metadata table

Thanks for the reply , I am updating the full code -

import pyspark # only run after findspark.init()
from pyspark.sql import SparkSession

spark = SparkSession \
    .builder \
    .appName("Python Spark SQL basic example") \
    .config("spark.sql.catalogImplementation", "in-memory") \
    .getOrCreate()
from pyspark.sql import Row
l = [('Alice', 1)]
Person = Row('name', 'age')
rdd = sc.parallelize(l)
person = rdd.map(lambda r: Person(*r))
df2 = spark.createDataFrame(person)
df3=df2.dtypes
df1=spark.createDataFrame(df3, ['colname', 'datatype'])
df1.show()
df1.createOrReplaceTempView("test")
spark.sql('''select * from test ''').show()

Output

+-------+--------+
|colname|datatype|
+-------+--------+
|   name|  string|
|    age|  bigint|
+-------+--------+

+-------+--------+
|colname|datatype|
+-------+--------+
|   name|  string|
|    age|  bigint|
+-------+--------+

IIUC, you can loop over the values in df2.schema.fields and get the name and dataType :

print([(x.name, x.dataType) for x in df2.schema.fields])
#[('name', StringType), ('age', LongType)]

There is also dtypes :

print(df2.dtypes)
#[('name', 'string'), ('age', 'bigint')]

and you may also be interested in printSchema() :

df2.printSchema()
#root
# |-- name: string (nullable = true)
# |-- age: long (nullable = true)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM