简体   繁体   中英

Extract schema labels from pyspark dataframe

From a pyspark dataframe I want to create a python list with the schema labels for a specific schema "level".

The schema is:

root
 |-- DISPLAY: struct (nullable = true)
 |    |-- 1WO: struct (nullable = true)
 |    |    |-- JPY: struct (nullable = true)
 |    |    |    |-- CHANGE24HOUR: string (nullable = true)
 |    |    |    |-- CHANGEDAY: string (nullable = true)
 |    |-- AAVE: struct (nullable = true)
 |    |    |-- JPY: struct (nullable = true)
 |    |    |    |-- CHANGE24HOUR: string (nullable = true)
 |    |    |    |-- CHANGEDAY: string (nullable = true)

The expected output is:

list = 1WO, AAVE

The following code print everything in the schema:

df.schema.jsonValue()

Is there an easy way to extract those labels pls?

Select the first layer using the asterisk notation, and the n list the columns:

df.select('DISPLAY.*').columns

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM