简体   繁体   中英

How to specify date bin ranges for Seaborn displot

Problem statement

I am creating a distribution plot of flood events per N year periods starting in 1870. I am using Pandas and Seaborn. I need help with...

  1. specifying the date range of each bin when usingsns.displot , and
  2. clearly representing my bin size specifications along the x axis.

To clarify this problem, here is the data that I am working with, what I have tried, and a description of the desired output.

The Data

The data I am using is available from the US Weather service.

import pandas as pd
import bs4
import urllib.request
link = "https://water.weather.gov/ahps2/crests.php?wfo=jan&gage=jacm6&crest_type=historic"

webpage=str(urllib.request.urlopen(link).read())
soup = bs4.BeautifulSoup(webpage)

tbl = soup.find('div', class_='water_information')
vals = tbl.get_text().split(r'\n')

tcdf = pd.Series(vals).str.extractall(r'\((?P<Rank>\d+)\)\s(?P<Stage>\d+.\d+)\sft\son\s(?P<Date>\d{2}\/\d{2}\/\d{4})')\
    .reset_index(drop=True)

tcdf['Stage'] = tcdf.Stage.astype(float)
total_crests_events = len(tcdf)
tcdf['Rank'] = tcdf.Rank.astype(int)
tcdf['Date'] = pd.to_datetime(tcdf.Date)

What works

I am able to plot the data with Seaborn's displot , and I can manipulate the number of bins with the bins command.

The second image is closer to my desired output. However, I do not think that it's clear where the bins start and end. For example, the first two bins (reading left to right) clearly start before and end after 1880, but the precise years are not clear.

import seaborn as sns
# fig. 1: data distribution using default bin parameters
sns.displot(data=tcdf,x="Date")
# fig. 2: data distribution using 40 bins
sns.displot(data=tcdf,x="Date",bins=40)

使用默认 bin 参数的数据分布 在此处输入图像描述

What fails

I tried specifying date ranges using the bins input. The approach is loosely based on a previous SO thread .

my_bins = pd.date_range(start='1870',end='2025',freq='5YS')
sns.displot(data=tcdf,x="Date",bins=my_bins)

This attempt, however, produced a TypeError

TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'

This is a long question, so I imagine that some clarification might be necessary. Please do not hesitate to ask questions in the comments.

Thanks in advance.

Seaborn internally converts its input data to numbers so that it can do math on them, and it uses matplotlib's "unit conversion" machinery to do that. So the easiest way to pass bins that will work is to use matplotlib's date converter:

sns.displot(data=tcdf, x="Date", bins=mpl.dates.date2num(my_bins))

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM