简体   繁体   中英

How to plot Pandas datetime series in Seaborn distplot?

I have a pandas dataframe with a datetime column. I would like to plot the distribution of the rows according to that date column, but I'm currenty getting an unhelpful error. I have:

df['Date'] = pd.to_datetime(df['Date'], errors='raise')
s = sns.distplot(df['Date'])

which throws the error:

TypeError: ufunc add cannot use operands with types dtype('<M8[ns]') and dtype('<M8[ns]')

If I change the column I'm plotting to numeric data then it all works fine. How can I get the datetime column to behave nicely? I can't really find much about what I think I need in the docs. Any and all help appreciated.

The below is the result of df.head(2) , I have removed some columns for security reasons etc:

               Date                 
2812         2016-03-05
2813         2016-03-05

Apparently the column (when taken as a series) has properties

Name: Date, dtype: datetime64[ns]

I came across this question while having the same problem myself. As mentioned in comments, it seems like seaborn's distplot doesn't support dates to work with. Unfortunately, I could not find anything in official documentation to support this claim.

I found two ways to deal with this problem. None of them is perfect, yet that's the best I found.

Option 1: Convert dates to numbers

Convert to some numeric metric and work with that. displot works with numbers, so if each date was represented by a number we will be okay. The mapping between dates and numbers is kinda like use MinMax Scaler. For example, We can set "2017-01-01" as 0 and "2020-06-06" as 1, and map all dates between them to values in range [0,1].

What range of numbers to use it's depends on the range of your data, could be days/months/ years or etc.

I'll demonstrate this approach with this toy example.

import pandas as pd
import datetime as dt

original_dates = ["2016-03-05", "2016-03-05", "2016-02-05", "2016-02-05", "2016-02-05", "2014-03-05"]
dates_list = [dt.datetime.strptime(date, '%Y-%m-%d').date() for date in original_dates]

df = pd.DataFrame({"Date":dates_list})

now dataframe is as follows:

         Date
0  2016-03-05
1  2016-03-05
2  2016-02-05
3  2016-02-05
4  2016-02-05
5  2014-03-05

(not the best way to enter dates to dataframe of course, but it doesn't matter how).

Now I create a new column which will hold the difference in days between minimum date:

df["NewDate"] = df["Date"] - dt.date(2014,3,5)
df["NewDate"] = df["NewDate"].apply(lambda x: x.days)

result:

         Date  NewDate
0  2016-03-05      731
1  2016-03-05      731
2  2016-02-05      702
3  2016-02-05      702
4  2016-02-05      702
5  2014-03-05        0

notice I "hard-coded" the minimum date. You can use better ways to find minimum and not hard-coded it. I just wanted to get this part as fast as possible.

Now we can use displot on our new column:

import seaborn as sns
sns.set()
ax = sns.distplot(df['NewDate'])

output:

带有日期的 Seaborn displot

As you can see, it shows the days instead of dates. For my personal problem it was okay to show it that way. If you want to show it as dates, some extra step is needed: Show xticks which are function of x-axis, not directly the data it self. Example with dates (pandas, matplotlib)

As I said earlier, I used scaling by days difference but you can do the same with months or years. Depends on the data.

Option 2: Use histogram directly without seaborn's displot

In this question: Can Pandas plot a histogram of dates? there is an answer how to plot histogram with dates, using pandas's groupby .

It's not the same as displot , but it can be close-enough solution (as displot eventually is based on matplotlib's hist).

You could convert the dates to Categorical type, and plot the resulting codes (which are integers). Then, label the x-ticks with the Date (as category).

import pandas as pd
import seaborn as sns

original_dates = [
    "2016-03-05", "2016-03-05", "2016-02-05",
    "2016-02-05", "2016-02-05", "2014-03-05"]
dates_list = pd.to_datetime(original_dates)

df = pd.DataFrame({"Date": dates_list})
df['date-as-cat'] = df['Date'].astype('category')  # new 
df['codes'] = df['date-as-cat'].cat.codes          # new 

print(df)
print(df.dtypes)

        Date date-as-cat  codes
0 2016-03-05  2016-03-05      2
1 2016-03-05  2016-03-05      2
2 2016-02-05  2016-02-05      1
3 2016-02-05  2016-02-05      1
4 2016-02-05  2016-02-05      1
5 2014-03-05  2014-03-05      0

Date           datetime64[ns]
date-as-cat          category
codes                    int8
dtype: object 

The date-as-code and date-as-category info is obtained like this:

x = df[['codes', 'date-as-cat']].drop_duplicates().sort_values('codes')
print(x)

   codes date-as-cat
5      0  2014-03-05
2      1  2016-02-05
0      2  2016-03-05

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM