简体   繁体   中英

PySpark: Create a subset of a dataframe for all dates

I have a DataFrame that has a lot of columns and I need to create a subset of that DataFrame that has only date values.

For eg my Dataframe could be:

1, 'John Smith', '12/10/1982', '123 Main St', '01/01/2000'
2, 'Jane Smith', '11/21/1999', 'Abc St', '12/12/2020'

And my new DataFrame should only have:

'12/10/1982', '01/01/2000'
'11/21/1999', '12/12/2000'

The dates could be of any format and could be on any column. I can use the dateutil.parser to parse them to make sure they are dates. But not sure how to call parse() on all the columns and only filter those that return true to another dataframe, easily.

If you know what you columns the datetimes are in it's easy:

pd2 = pd[["row_name_1", "row_name_2"]]
# or 
pd2 = pd.iloc[:, [2, 4]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM