[英]How to plot a wide dataframe with colors and linestyles based on different columns
Here's a dataframe of mine:这是我的 dataframe:
d = {'year': [2020,2020,2020,2021,2020,2020,2021],
'month': [10, 11,12,1,11,12,1],
'class':['A','A','A','A','B','B','B'],
'val1':[2,3,4,5,1,1,1],
'val2':[3,3,3,3,2,3,5]}
df = pd.DataFrame(data=d)
Output: Output:
year month class val1 val2
0 2020 10 A 2 3
1 2020 11 A 3 3
2 2020 12 A 4 3
3 2021 1 A 5 3
4 2020 11 B 1 2
5 2020 12 B 1 3
6 2021 1 B 1 5
I need to plot val1 and val2 over time, in different colors (say green and red).我需要 plot val1 和 val2 随着时间的推移,在不同的 colors (比如绿色和红色)。 There are also two classes A and B, and I'd like to plot the two classes in different line types (solid and dashed).
还有两个类A和B,我想plot这两个类在不同的线型(实线和虚线)。 So if class is A, then val1 might be solid green in the plot, and if the class is B, then val1 might be dashed green in the plot.
So if class is A, then val1 might be solid green in the plot, and if the class is B, then val1 might be dashed green in the plot. If class is B, then val2 might be solid red in the plot, and if the class is B, then val2 might be dashed red in the plot.
If class is B, then val2 might be solid red in the plot, and if the class is B, then val2 might be dashed red in the plot.
But I got a problem with the time (x-axis) that I need to resolve.但是我遇到了需要解决的时间(x 轴)问题。 First of all, the time is in different columns (year and month) and there are different amount of rows for the two classes.
首先,时间在不同的列(年和月)中,并且两个类的行数不同。 In the data above, class B doesn't start till Nov. of 2020.
在上面的数据中,class B 直到 2020 年 11 月才开始。
My attempt to resolve this is to create new index using the year and month:我尝试解决这个问题是使用年份和月份创建新索引:
df.index=df['year']+df['month']/12
df.groupby('class')['val1'].plot(legend='True')
plt.show()
But this creates non-ideal tick labels on the x-axis (which I suppose I can rename later).但这会在 x 轴上创建不理想的刻度标签(我想我可以稍后重命名)。 While it differentiates the two classes, it doesn't do so in the way I want.
虽然它区分了这两个类,但它并没有按照我想要的方式进行。 Nor do I know how to add more columns to the plot.
我也不知道如何向 plot 添加更多列。 Please advise.
请指教。 Thanks
谢谢
While this can be done with pyplot
and matplotlib
, a higher level interface like seaborn
will substantially improve your experience with plotting multiple dimensions.虽然这可以使用
pyplot
和matplotlib
来完成,但像seaborn
这样的更高级别的接口将大大改善您绘制多个维度的体验。 see the docs for all the various ways you can label your data with seaborn请参阅文档以了解您可以使用 label 使用 seaborn 的数据的所有各种方式
Try:尝试:
import pandas as pd
import seaborn as sns
df['time'] = df.year + df.month/12
df1 = pd.wide_to_long(df, stubnames='val', i=['year', 'month', 'class'], j='val_number').reset_index()
sns.lineplot(x='time', y='val', hue='class', size='val_number', data=df1)
The dataframe will be in "long" form now to allow unique "vals" for each "time" point, with associated identifier labels you can use. dataframe 现在将采用“长”形式,以允许每个“时间”点的唯一“vals”,以及您可以使用的相关标识符标签。
The plot will look a little messy but that is because of how much you are trying to represent with a line plot plot 看起来有点乱,但那是因为你试图用一条线 plot 表示多少
'year'
and 'month'
column to create a column with a datetime dtype
.'year'
和'month'
列以创建具有datetime dtype
的列。pandas.DataFrame.melt
is used to pivot the DataFrame from a wide to long format pandas.DataFrame.melt
用于 pivot ZBA834BA059A9A379459C112175EB88 格式从长到宽seaborn.relplot
, which is a figure level plot, to simplify setting the height and width of the figure. seaborn.relplot
,这是一个图形级别 plot ,以简化设置图形的高度和宽度。
seaborn.lineplot
seaborn.lineplot
hue
and style
for color and linestyle, respectively.hue
和style
。mdates
to provide a nice format to the x-axis.mdates
为 x 轴提供一个很好的格式。 Remove if not needed.pandas 1.2.4
, seaborn 0.11.1
, and matplotlib 3.4.2
.pandas 1.2.4
、 seaborn 0.11.1
和matplotlib 3.4.2
测试。import pandas as pd
import seaborn as sns
import matplotlib.dates as mdates # required for formatting the x-axis dates
import matplotlib.pyplot as plt # required for creating the figure when using sns.lineplot; not required for sns.relplot
# combine year and month to create a date column
df['date'] = pd.to_datetime(df.year.astype(str) + df.month.astype(str), format='%Y%m')
# melt the dataframe into a tidy format
df = df.melt(id_vars=['date', 'class'], value_vars=['val1', 'val2'])
seaborn.relplot
# plot with seaborn
p = sns.relplot(data=df, kind='line', x='date', y='value', hue='variable', style='class', height=4, aspect=2, marker='o')
# format the x-axis - use as needed
# xfmt = mdates.DateFormatter('%Y-%m')
# p.axes[0, 0].xaxis.set_major_formatter(xfmt)
seaborn.lineplot
# set the figure height and width
fig, ax = plt.subplots(figsize=(8, 4))
# plot with seaborn
sns.lineplot(data=df, x='date', y='value', hue='variable', style='class', marker='o', ax=ax)
# format the x-axis
xfmt = mdates.DateFormatter('%Y-%m')
ax.xaxis.set_major_formatter(xfmt)
# move the legend
ax.legend(bbox_to_anchor=(1.04, 0.5), loc="center left")
df
df
date class variable value
0 2020-10-01 A val1 2
1 2020-11-01 A val1 3
2 2020-12-01 A val1 4
3 2021-01-01 A val1 5
4 2020-11-01 B val1 1
5 2020-12-01 B val1 1
6 2021-01-01 B val1 1
7 2020-10-01 A val2 3
8 2020-11-01 A val2 3
9 2020-12-01 A val2 3
10 2021-01-01 A val2 3
11 2020-11-01 B val2 2
12 2020-12-01 B val2 3
13 2021-01-01 B val2 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.