简体   繁体   English

Y轴Matplotlib上的双标签

[英]Double labels on Y-axis Matplotlib

I've made a barh graph with a scatter-plot on top. 我制作了一个带有散点图的barh图。 The data is about 100 books and the publishing date along with the year the author was born and died. 数据约为100本书,出版日期以及作者出生和死亡的年份。 The barh shows the time the author was alive and the scatter-plot shows the year there books where published. 栏显示作者活着的时间,散点图显示在那里出版的年份。

The problem I am facing is being able to plot multiple books on one bar. 我面临的问题是能够在一栏上绘制多本书。 As I have duplicate bars now with different books. 由于我现在使用不同的书籍重复制作酒吧。 I am creating the y-axis based on position in the array and I'm adding the label later. 我根据数组中的位置创建y轴,稍后再添加标签。

My relevant code: 我的相关代码:

# dataframe columns to arrays. (dataset is my pandas dataframe)
begin = np.array(dataset.BORN)
end = np.array(dataset.DIED)
book = np.array(dataset['YEAR (BOOK)'])

# Data to a barh graph (sideways bar)
plt.barh(range(len(begin)), end-begin, left=begin, zorder=2, 
color='#007acc', alpha=0.8, linewidth=5)

# Plots the books in a scatterplot. Changes marker color and shape.
plt.scatter(book, range(len(begin)), color='purple', s=30, marker='D', zorder=3)

# Sets the titles of the y-axis.
plt.yticks(range(len(begin)), dataset.AUTHOR)

# Sets start and end of the x-axis.
plt.xlim([1835, 2019])

# Shows the plt
plt.show()

Picture that shows part of my current graph: 该图显示了我当前图形的一部分: 当前图的一部分

I'd aggregate your dataset down so that you get a single author per row using groupby and use this to draw the bars, then join this back to get a value to use to draw the books, eg: 我汇总了您的数据集,以便使用groupby每一行中获得一位作者,并使用它来绘制条形图,然后将其重新加入以获取用于绘制书籍的值,例如:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame([
    ['foo', 1950, 1990, 1980],
    ['foo', 1950, 1990, 1985],
    ['bar', 1930, 2000, 1970],
], columns=['author', 'born', 'died', 'published'])

pulls in packages and creates a dummy dataset, next we reduce this down to a single row per author, getting when they were born & died: 提取软件包并创建一个虚拟数据集,接下来我们将其缩减为每个作者单行,以获取他们的出生和死亡时间:

agg = df.groupby('author')['born', 'died'].agg(min).reset_index()
agg['auth_num'] = range(len(agg))

the reset_index makes the author back into a normal column, and we create an arbitrary auth_num column, you might want to put a sort_values in there if you want to sort authors by something other than their name (which I'd recommend as alphabetical generally isn't the most useful ) reset_indexauthor回一个正常的列,我们创建了一个任意auth_num列,你可能想要把一个sort_values在那里,如果你想比他们的名字以外的东西(我会建议作为一般字母排序作者ISN不是最有用的

next we can join this back on to the original dataset to get an author number for each book: 接下来,我们可以将其重新加入原始数据集,以获取每本书的作者编号:

df2 = pd.merge(df, agg[['author', 'auth_num']], on='author')

and finally plot it all: 最后绘制所有内容:

plt.barh(agg.auth_num, agg.died - agg.born, left=agg.born, zorder=-1, alpha=0.5)
plt.yticks(agg.auth_num, agg.author)

plt.scatter(df2.published, df2.auth_num)

giving something like: 给出类似的东西:

使用seaborn的演示情节

note: if you set use_sticky_edges to False before calling barh , it'll allow the x-axis to auto-scale and hence the left-most author won't "stick" to the left-hand margin 注意:如果你设置use_sticky_edgesFalse调用之前barh ,它会允许X轴自动缩放,因此最左边的作者不会“粘”在左边距

Sure, there are several options you could use. 当然,可以使用几个选项。 You could either create another array for 1st, 2nd, 3rd books. 您可以为第一,第二,第三本书创建另一个数组。 Or you could create a dictionary or list of arrays to plot the books per author. 或者,您可以创建字典或数组列表来绘制每位作者的书。

I have re-produced some examples using dummy data below. 我在下面使用虚拟数据重现了一些示例。

import matplotlib.pyplot as plt
import numpy as np

fig,axs = plt.subplots(1,1,figsize=(10,10))

# dataframe columns to arrays. (dataset is my pandas dataframe)
begin = np.arange(1900,1950)
end = np.arange(1975,2025)

# create two random arrays for your book dates
book1 = np.array(np.random.randint(low=1950, high=1970, size=50))
book2 = np.array(np.random.randint(low=1950, high=1970, size=50))

# add some athor names
author_names = [f'Author_{x+1}' for x in range(50)]

# Data to a barh graph (sideways bar)
axs.barh(range(len(begin)), end-begin, left=begin, zorder=2, 
color='#007acc', alpha=0.8, linewidth=5)

# Plots the books in a scatterplot. Changes marker color and shape.
axs.scatter(book1, range(len(begin)), color='purple', s=30, marker='D', zorder=3, label='1st Book')

# second array of books
axs.scatter(book2, range(len(begin)), color='yellow', s=30, marker='D', zorder=3, label='2nd Book')

# or plot a custom array of books
# you could do this in a for loop for all authors
axs.scatter(x=[1980,2005], y=[10,45], color='red', s=50, marker='X', zorder=3, label='3rd Book')

# Sets the titles of the y-axis.
axs.set_yticks(range(len(begin)))
axs.set_yticklabels(author_names)

# Add legend
axs.legend()

# Sets start and end of the x-axis.
axs.set_xlim([1895, 2025])
axs.set_ylim([-1,50]);

在此处输入图片说明

(Next time please include a dataframe example !) (下次请提供一个数据框示例!)

I would use the great numpy.unique method to perform the grouping operation. 我将使用伟大的numpy.unique方法执行分组操作。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


dataset = pd.DataFrame({'BORN': [1900, 1920, 1900],
                        'DIED': [1980, 1978, 1980],
                        'AUTHOR': ['foo', 'bar', 'foo'],
                        'YEAR (BOOK)': [1950, 1972, 1961]})

# --group by author
unique_authors, index, reverse_index = np.unique(dataset.AUTHOR.values, return_index=True, return_inverse=True)
authors_df = dataset.loc[index, ['AUTHOR', 'BORN', 'DIED']]
dataset['AUTHOR_IDX'] = reverse_index  # remember the index

# dataframe columns to arrays.
begin = authors_df.BORN.values
end = authors_df.DIED.values
authors = authors_df.AUTHOR.values

# --Author data to a barh graph (sideways bar)
plt.barh(range(len(begin)), end-begin, left=begin, zorder=2, color='#007acc', alpha=0.8, linewidth=5)

# Sets the titles of the y-axis.
plt.yticks(range(len(begin)), authors)

# Sets start and end of the x-axis.
plt.xlim([1835, 2019])

# --Overlay book information
# dataframe columns to arrays
book = dataset['YEAR (BOOK)'].values

# Plots the books in a scatterplot. Changes marker color and shape.
plt.scatter(book, reverse_index, color='purple', s=30, marker='D', zorder=3)

# Shows the plt
plt.show()

Yields: 产量:

result_bargraph

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM