I have a dataframe with a column containing a list of dates:
data = [
[
1,
[
"2017-12-06",
"2017-12-05",
"2017-12-06",
"2018-01-03",
"2018-01-04",
"2017-11-24",
],
],
[
2,
[
"2019-03-10",
"2018-12-03",
"2018-12-04",
"2018-11-08",
"2018-11-30",
"2019-03-22",
"2018-11-24",
"2019-03-06",
"2017-11-16",
],
],
]
df = pd.DataFrame(data, columns=["id", "dates"])
df
id dates
1 [2017-12-06, 2017-12-05, 2017-12-06, 2018-01-03, 2018-01-04, 2017-11-24]
2 [2019-03-10, 2018-12-03, 2018-12-04, 2018-11-08, 2018-11-30, 2019-03-22, 2018-11-24, 2019-03-06, 2017-11-16]
print(df.dtypes)
id int64
dates object
dtype: object
I would like to sort the date containing column ( dates
). I have tried a number of methods with no success (including .apply(list.sort) in place
). The only method that I've found that works is using .sort(key = ....)
like below:
import datetime
from datetime import datetime
dates = [
"2019-03-10",
"2018-12-03",
"2018-12-04",
"2018-11-08",
"2018-11-30",
"2019-03-22",
"2018-11-24",
"2019-03-06",
"2017-11-16",
]
dates.sort(key=lambda date: datetime.strptime(date, "%Y-%m-%d"))
but I can only get it to work on a list and I want to apply this to that entire column in the dataframe df
. Can anyone advise the best way to do this? Or perhaps there is an even better way to sort this column?
What I see here is that you want the list in every row to be sorted (not the column itself).
The code below applies a certain function (something like my_sort()
) to each row of column "dates":
df['dates'].apply(my_sort)
You just need to implement my_sort
to be applied to the list in each row. Something like:
def my_sort(dates):
dates.sort(key=lambda date: datetime.strptime(date, "%Y-%m-%d"))
return dates
list.sort()
sorts the list and returns None
so you need to return the list itself after calling sort
.
Edit :
According to the comment from @jch , it's a better practice to copy the list first and then call sort
method. This way, any unexpected behavior or error produced by sort
method (if any happens) won't affect the original list (in your datafram). To achieve that, you can change my_sort
to something like:
from copy import deepcopy
def my_sort(dates):
dates_copy = deepcopy(dates)
dates_copy.sort(key=lambda date: datetime.strptime(date, "%Y-%m-%d"))
return dates_copy
You can learn more about copy
and deepcopy
of objects here .
You can use .apply() to apply a given function (in this case 'sort') to every row of a dataframe column.
This should work:
df['dates'].apply(lambda row: row.sort(key=lambda date: datetime.strptime(date, "%Y-%m-%d")))
print(df)
returns:
id dates
0 1 ['2017-11-24', '2017-12-05', '2017-12-06', '2017-12-06', '2018-01-03', '2018-01-04']
1 2 ['2017-11-16', '2018-11-08', '2018-11-24', '2018-11-30', '2018-12-03', '2018-12-04', '2019-03-06', '2019-03-10', '2019-03-22']
Note that in this case the code df['data'] = df['data'].apply(...)
will NOT work because the sort function has a default inplace=True parameter: it directly modifies the dataframe and doesn't create a new one. To apply other functions you might have to use the df = df.apply(etc)
formulation.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.