简体   繁体   中英

How do I get a simple scatter plot of a dataframe (preferrably with seaborn)

I'm trying to scatter plot the following dataframe:

mydf = pd.DataFrame({'x':[1,2,3,4,5,6,7,8,9], 
                 'y':[9,8,7,6,5,4,3,2,1], 
                 'z':np.random.randint(0,9, 9)},
                index=["12:00", "1:00", "2:00", "3:00", "4:00", 
                       "5:00", "6:00", "7:00", "8:00"])



        x   y   z
 12:00  1   9   1
  1:00  2   8   1
  2:00  3   7   7
  3:00  4   6   7
  4:00  5   5   4
  5:00  6   4   2
  6:00  7   3   2
  7:00  8   2   8
  8:00  9   1   8

I would like to see the times "12:00, 1:00, ..." as the x-axis and x,y,z columns on the y-axis.

When I try to plot with pandas via mydf.plot(kind="scatter") , I get the error ValueError: scatter requires and x and y column . Do I have to break down my dataframe into appropriate parameters? What I would really like to do is get this scatter plotted with seaborn.

Just running

mydf.plot(style=".")

works fine for me:

示例散点图作为上述代码的结果

Seaborn is actually built around pandas.DataFrame s. However, your data frame needs to be "tidy" :

  1. Each variable forms a column.
  2. Each observation forms a row.
  3. Each type of observational unit forms a table.

Since you want to plot x, y, and z on the same plot, it seems like they are actually different observations . Thus, you really have three variables: time, value, and the letter used.

The "tidy" standard comes from Hadly Wickham, who implemented it in the tidyr package .

First, I convert the index to a Datetime:

mydf.index = pd.DatetimeIndex(mydf.index)

Then we do the conversion to tidy data:

pivoted = mydf.unstack().reset_index()

and rename the columns

pivoted = pivoted.rename(columns={"level_0": "letter", "level_1": "time", 0: "value"})

Now, this is what our data looks like:

  letter                time  value
0      x 2019-03-13 12:00:00      1
1      x 2019-03-13 01:00:00      2
2      x 2019-03-13 02:00:00      3
3      x 2019-03-13 03:00:00      4
4      x 2019-03-13 04:00:00      5

Unfortunately, seaborn doesn't play with DateTimes that well, so you can just extract the hour as an integer:

pivoted["hour"] = pivoted["time"].dt.hour

With a data frame in this form, seaborn takes in the data easily:

import seaborn as sns
sns.set()

sns.scatterplot(data=pivoted, x="hour", y="value", hue="letter")

Outputs:

数据图

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM