简体   繁体   中英

dataframe.plot exclude missing data

I want to plot some data over time. my dataframe has one column date with format 2015-11-25 10:00:00 (datetime64) the other column, data , is format 1.53 (just a series of numbers float64)

Now where it gets tricky, is that the samples were taken in series. eg :

  1. 1st series from 2015-11-20 00:00:00 till 2015-11-21 00:00:00
  2. 2nd series from 2015-11-22 00:00:00 till 2015-11-23 00:00:00
  3. 3rd series from 2015-11-24 00:00:00 till 2015-11-25 00:00:00

All the data is one below the other, so there are no gaps in the data.

so when I execute my code:

ax = df.plot(x='Date', y='Data') 
fig = ax.get_figure()

I get a graph that fills in the data on the dates that I never measured. All I want is to show is a graph with the data on the ACTUAL dates I measured. I don't understand why python extrapolates these data points. How can I turn off this feature?

Pandas' plot() function by default creates a line plot. If you only want to plot the data points you have, create a scatter plot instead.

ax = df.plot(kind='scatter', x='Date', y='Data')

See: http://pandas.pydata.org/pandas-docs/stable/visualization.html#visualization-scatter


Edit

As pandas' Scatter Plot plotting function requires numeric columns for both x and y axis, you'll run into issues with my original answer. The best way to do this is to plot using matplotlib directly. For what you're trying to do, the below sample should work:

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot_date(df['Date'], df['Data'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM