I have to get the schema from a csv file (the column name and datatype).I have reached so far -
l = [('Alice', 1)]
Person = Row('name', 'age')
rdd = sc.parallelize(l)
person = rdd.map(lambda r: Person(*r))
df2 = spark.createDataFrame(person)
print(df2.schema)
#StructType(List(StructField(name,StringType,true),StructField(age,LongType,true)))
I want to extract the values name
and age
along with StringType
and LongType
however I don't see any method on struct type.
There's toDDL
method of struct type in scala but the same is not available for python.
This is an extension of the mentioned question where I already got help , however I wanted to create a new thread - Get dataframe schema load to metadata table
Thanks for the reply , I am updating the full code -
import pyspark # only run after findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName("Python Spark SQL basic example") \
.config("spark.sql.catalogImplementation", "in-memory") \
.getOrCreate()
from pyspark.sql import Row
l = [('Alice', 1)]
Person = Row('name', 'age')
rdd = sc.parallelize(l)
person = rdd.map(lambda r: Person(*r))
df2 = spark.createDataFrame(person)
df3=df2.dtypes
df1=spark.createDataFrame(df3, ['colname', 'datatype'])
df1.show()
df1.createOrReplaceTempView("test")
spark.sql('''select * from test ''').show()
Output
+-------+--------+
|colname|datatype|
+-------+--------+
| name| string|
| age| bigint|
+-------+--------+
+-------+--------+
|colname|datatype|
+-------+--------+
| name| string|
| age| bigint|
+-------+--------+
IIUC, you can loop over the values in df2.schema.fields
and get the name
and dataType
:
print([(x.name, x.dataType) for x in df2.schema.fields])
#[('name', StringType), ('age', LongType)]
There is also dtypes
:
print(df2.dtypes)
#[('name', 'string'), ('age', 'bigint')]
and you may also be interested in printSchema()
:
df2.printSchema()
#root
# |-- name: string (nullable = true)
# |-- age: long (nullable = true)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.