简体   繁体   English

如何 plot 宽 dataframe 与 colors 和基于不同列的线型

[英]How to plot a wide dataframe with colors and linestyles based on different columns

Here's a dataframe of mine:这是我的 dataframe:

d = {'year': [2020,2020,2020,2021,2020,2020,2021], 
     'month': [10, 11,12,1,11,12,1],
     'class':['A','A','A','A','B','B','B'],
     'val1':[2,3,4,5,1,1,1],
     'val2':[3,3,3,3,2,3,5]}

df = pd.DataFrame(data=d)

Output: Output:

   year  month class  val1  val2
0  2020     10     A     2     3
1  2020     11     A     3     3
2  2020     12     A     4     3
3  2021      1     A     5     3
4  2020     11     B     1     2
5  2020     12     B     1     3
6  2021      1     B     1     5

I need to plot val1 and val2 over time, in different colors (say green and red).我需要 plot val1 和 val2 随着时间的推移,在不同的 colors (比如绿色和红色)。 There are also two classes A and B, and I'd like to plot the two classes in different line types (solid and dashed).还有两个类A和B,我想plot这两个类在不同的线型(实线和虚线)。 So if class is A, then val1 might be solid green in the plot, and if the class is B, then val1 might be dashed green in the plot. So if class is A, then val1 might be solid green in the plot, and if the class is B, then val1 might be dashed green in the plot. If class is B, then val2 might be solid red in the plot, and if the class is B, then val2 might be dashed red in the plot. If class is B, then val2 might be solid red in the plot, and if the class is B, then val2 might be dashed red in the plot.

But I got a problem with the time (x-axis) that I need to resolve.但是我遇到了需要解决的时间(x 轴)问题。 First of all, the time is in different columns (year and month) and there are different amount of rows for the two classes.首先,时间在不同的列(年和月)中,并且两个类的行数不同。 In the data above, class B doesn't start till Nov. of 2020.在上面的数据中,class B 直到 2020 年 11 月才开始。

My attempt to resolve this is to create new index using the year and month:我尝试解决这个问题是使用年份和月份创建新索引:

df.index=df['year']+df['month']/12
df.groupby('class')['val1'].plot(legend='True')
plt.show()

在此处输入图像描述

But this creates non-ideal tick labels on the x-axis (which I suppose I can rename later).但这会在 x 轴上创建不理想的刻度标签(我想我可以稍后重命名)。 While it differentiates the two classes, it doesn't do so in the way I want.虽然它区分了这两个类,但它并没有按照我想要的方式进行。 Nor do I know how to add more columns to the plot.我也不知道如何向 plot 添加更多列。 Please advise.请指教。 Thanks谢谢

While this can be done with pyplot and matplotlib , a higher level interface like seaborn will substantially improve your experience with plotting multiple dimensions.虽然这可以使用pyplotmatplotlib来完成,但像seaborn这样的更高级别的接口将大大改善您绘制多个维度的体验。 see the docs for all the various ways you can label your data with seaborn请参阅文档以了解您可以使用 label 使用 seaborn 的数据的所有各种方式

Try:尝试:

import pandas as pd
import seaborn as sns

df['time'] = df.year + df.month/12
df1 = pd.wide_to_long(df, stubnames='val', i=['year', 'month', 'class'], j='val_number').reset_index()

sns.lineplot(x='time', y='val', hue='class', size='val_number', data=df1)

The dataframe will be in "long" form now to allow unique "vals" for each "time" point, with associated identifier labels you can use. dataframe 现在将采用“长”形式,以允许每个“时间”点的唯一“vals”,以及您可以使用的相关标识符标签。

在此处输入图像描述

The plot will look a little messy but that is because of how much you are trying to represent with a line plot plot 看起来有点乱,但那是因为你试图用一条线 plot 表示多少

在此处输入图像描述

  1. Combine the 'year' and 'month' column to create a column with a datetime dtype .结合'year''month'列以创建具有datetime dtype的列。
  2. pandas.DataFrame.melt is used to pivot the DataFrame from a wide to long format pandas.DataFrame.melt用于 pivot ZBA834BA059A9A379459C112175EB88 格式从长到宽
  3. Plot usingseaborn.relplot , which is a figure level plot, to simplify setting the height and width of the figure. Plot 使用seaborn.relplot ,这是一个图形级别 plot ,以简化设置图形的高度和宽度。
    • Similar to seaborn.lineplot类似于seaborn.lineplot
    • Specify hue and style for color and linestyle, respectively.分别为颜色和线条样式指定huestyle
  4. Use mdates to provide a nice format to the x-axis.使用mdates为 x 轴提供一个很好的格式。 Remove if not needed.如果不需要,请删除。
  • Tested with pandas 1.2.4 , seaborn 0.11.1 , and matplotlib 3.4.2 .使用pandas 1.2.4seaborn 0.11.1matplotlib 3.4.2测试。

Imports and Transform DataFrame导入和改造DataFrame

import pandas as pd
import seaborn as sns
import matplotlib.dates as mdates  # required for formatting the x-axis dates
import matplotlib.pyplot as plt  # required for creating the figure when using sns.lineplot; not required for sns.relplot

# combine year and month to create a date column
df['date'] = pd.to_datetime(df.year.astype(str) + df.month.astype(str), format='%Y%m')

# melt the dataframe into a tidy format
df = df.melt(id_vars=['date', 'class'], value_vars=['val1', 'val2'])

seaborn.relplot

# plot with seaborn
p = sns.relplot(data=df, kind='line', x='date', y='value', hue='variable', style='class', height=4, aspect=2, marker='o')

# format the x-axis - use as needed
# xfmt = mdates.DateFormatter('%Y-%m')
# p.axes[0, 0].xaxis.set_major_formatter(xfmt)

在此处输入图像描述

seaborn.lineplot

# set the figure height and width
fig, ax = plt.subplots(figsize=(8, 4))

# plot with seaborn
sns.lineplot(data=df, x='date', y='value', hue='variable', style='class', marker='o', ax=ax)

# format the x-axis
xfmt = mdates.DateFormatter('%Y-%m')
ax.xaxis.set_major_formatter(xfmt)

# move the legend
ax.legend(bbox_to_anchor=(1.04, 0.5), loc="center left")

在此处输入图像描述

Melted df熔化的df

         date class variable  value
0  2020-10-01     A     val1      2
1  2020-11-01     A     val1      3
2  2020-12-01     A     val1      4
3  2021-01-01     A     val1      5
4  2020-11-01     B     val1      1
5  2020-12-01     B     val1      1
6  2021-01-01     B     val1      1
7  2020-10-01     A     val2      3
8  2020-11-01     A     val2      3
9  2020-12-01     A     val2      3
10 2021-01-01     A     val2      3
11 2020-11-01     B     val2      2
12 2020-12-01     B     val2      3
13 2021-01-01     B     val2      5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM