dataframe.plot exclude missing data

Question

I want to plot some data over time. my dataframe has one column date with format 2015-11-25 10:00:00 (datetime64) the other column, data , is format 1.53 (just a series of numbers float64)

Now where it gets tricky, is that the samples were taken in series. eg :

1st series from 2015-11-20 00:00:00 till 2015-11-21 00:00:00
2nd series from 2015-11-22 00:00:00 till 2015-11-23 00:00:00
3rd series from 2015-11-24 00:00:00 till 2015-11-25 00:00:00

All the data is one below the other, so there are no gaps in the data.

so when I execute my code:

ax = df.plot(x='Date', y='Data') 
fig = ax.get_figure()

I get a graph that fills in the data on the dates that I never measured. All I want is to show is a graph with the data on the ACTUAL dates I measured. I don't understand why python extrapolates these data points. How can I turn off this feature?

Answer 1

Pandas' plot() function by default creates a line plot. If you only want to plot the data points you have, create a scatter plot instead.

ax = df.plot(kind='scatter', x='Date', y='Data')

See: http://pandas.pydata.org/pandas-docs/stable/visualization.html#visualization-scatter

Edit

As pandas' Scatter Plot plotting function requires numeric columns for both x and y axis, you'll run into issues with my original answer. The best way to do this is to plot using matplotlib directly. For what you're trying to do, the below sample should work:

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot_date(df['Date'], df['Data'])

dataframe.plot exclude missing data

Question

1 answers

solution1
1 ACCPTED 2015-12-04 17:30:08

dataframe.plot exclude missing data

Question

1 answers

solution1 1 ACCPTED 2015-12-04 17:30:08

solution1
1 ACCPTED 2015-12-04 17:30:08