简体   繁体   中英

How to eliminate row and column name values from the dataframe result in pyspark?

Hi i am loading a csv file into data frame, and running a filter operation on the data frame, and i am getting output as below

[Row(table_name=u'DEMO', rec_count=u'170049', col_count=u'36')]

How can i get the output like below

`['DEMO','170049','36']`

i tried uni-coding and i can use for loop to iterate the data but problem the data is dynamic some time i get more than three values but i want to automate the process but i am unable to get the data as above

You have a list whose element is a Row object; You could use a keys list to define the columns and corresponding order you need in the result and then extract them from the Row object with a list comprehension:

# this is what you have now
x = [Row(table_name=u'DEMO', rec_count=u'170049', col_count=u'36')]

keys = ['table_name', 'rec_count', 'col_count']
[x[0][key] for key in keys]
# [u'DEMO', u'170049', u'36']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM