PySpark: Create a subset of a dataframe for all dates

Question

I have a DataFrame that has a lot of columns and I need to create a subset of that DataFrame that has only date values.

For eg my Dataframe could be:

1, 'John Smith', '12/10/1982', '123 Main St', '01/01/2000'
2, 'Jane Smith', '11/21/1999', 'Abc St', '12/12/2020'

And my new DataFrame should only have:

'12/10/1982', '01/01/2000'
'11/21/1999', '12/12/2000'

The dates could be of any format and could be on any column. I can use the dateutil.parser to parse them to make sure they are dates. But not sure how to call parse() on all the columns and only filter those that return true to another dataframe, easily.

Answer 1

If you know what you columns the datetimes are in it's easy:

pd2 = pd[["row_name_1", "row_name_2"]]
# or 
pd2 = pd.iloc[:, [2, 4]]

PySpark: Create a subset of a dataframe for all dates

Question

1 answers

solution1
0 2022-09-27 14:29:47

PySpark: Create a subset of a dataframe for all dates

Question

1 answers

solution1 0 2022-09-27 14:29:47

solution1
0 2022-09-27 14:29:47