简体   繁体   English

Y 轴值使用 seaborn 散点 plot 截断

[英]Y-axis values cuts off using seaborn scatter plot

I have an issue with plotting the big CSV file with Y-axis values ranging from 1 upto 20+ millions.我在绘制 Y 轴值从 1 到 20+ 百万不等的 CSV 大文件时遇到问题。 There are two problems I am facing right now.我现在面临两个问题。

  1. The Y-axis do not show all the values that it is suppose to. Y 轴没有显示它应该显示的所有值。 When using the original data, it shows upto 6 million, instead of showing all the data upto 20 millions.使用原始数据时,它显示最多 600 万,而不是显示所有数据最多 2000 万。 In the sample data (smaller data) I put below, it only shows the first Y-axis value and does not show any other values.在我放在下面的示例数据(较小的数据)中,它只显示第一个 Y 轴值,不显示任何其他值。

  2. In the label section, since I am using hue and style = name, "name" appears as the label title and as an item inside.在 label 部分中,由于我使用的是 hue 和 style = name,因此“name”显示为 label 标题和内部项目。

Questions:问题:

  1. Could anyone give me a sample or help me to answer how may I show all the Y-axis values?谁能给我一个样本或帮助我回答如何显示所有 Y 轴值? How can I fix it so all the Y-values show up?我该如何修复它以便显示所有 Y 值?

  2. How can I get rid of "name" under label section without getting rid of shapes and colors for the scatter points?如何摆脱 label 部分下的“名称”而不摆脱散点的形状和 colors?

(Please let me know of any sources exist or this question was answered on some other post without labeling it duplicated. Please also let me know if I have any grammar/spelling issues that I need to fix. Thank you!) (请让我知道是否存在任何来源,或者这个问题在其他帖子上得到了回答,但没有将其标记为重复。如果我有任何需要修复的语法/拼写问题,也请告诉我。谢谢!)

Below you can find the function I am using to plot the graph and the sample data.您可以在下面找到我正在使用的 function 到 plot 图表和样本数据。

def test_graph (file_name):

    data_file = pd.read_csv(file_name, header=None, error_bad_lines=False, delimiter="|", index_col = False, dtype='unicode')
    data_file.rename(columns={0: 'name',
                              1: 'date',
                              2: 'name3',
                              3: 'name4',
                              4: 'name5',
                              5: 'ID',
                              6: 'counter'}, inplace=True)

    data_file.date = pd.to_datetime(data_file['date'], unit='s')
    
    norm = plt.Normalize(1,4)
    cmap = plt.cm.tab10

    df = pd.DataFrame(data_file)
 
    # Below creates and returns a dictionary of category-point combinations,
    # by cycling over the marker points specified.   
    points = ['o', 'v', '^', '<', '>', '8', 's', 'p', 'H', 'D', 'd', 'P', 'X']
    mult = len(df['name']) // len(points) + (len(df['name']) % len(points) > 0)
    markers = {key:value for (key, value)
               in zip(df['name'], points * mult)} ; markers
   
    sc = sns.scatterplot(data = df, x=df['date'], y=df['counter'], hue = df['name'], style = df['name'], markers = markers, s=50)
    ax.set_autoscaley_on(True)             
    
    ax.set_title("TEST", size = 12, zorder=0)      
            
    plt.legend(title="Names", loc='center left', shadow=True, edgecolor = 'grey', handletextpad = 0.1, bbox_to_anchor=(1, 0.5))             
               
    ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
    ax.yaxis.set_major_locator(ticker.MultipleLocator(100))               
               
    plt.xlabel("Dates", fontsize = 12, labelpad = 7)
    plt.ylabel("Counter", fontsize = 12)
    plt.grid(axis='y', color='0.95')
    
    fig.autofmt_xdate(rotation = 30)     
              
fig = plt.figure(figsize=(20,15),dpi=100)
ax = fig.add_subplot(1,1,1)                
test_graph(file_name)

plt.savefig(graph_results + "/Test.png", dpi=100)               

# Prevents to cut-off the bottom labels (manually) => makes the bottom part bigger
plt.gcf().subplots_adjust(bottom=0.15)
plt.show()

          

Sample data样本数据

namet1|1582334815|ai1|ai1||150|101
namet1|1582392415|ai2|ai2||142|105
namet2|1582882105|pc1|pc1||1|106
namet2|1582594106|pc1|pc1||1|123
namet2|1580592505|pc1|pc1||1|141
namet2|1580909305|pc1|pc1||1|144
namet3|1581974872|ai3|ai3||140|169
namet1|1581211616|ai4|ai4||134|173
namet2|1582550907|pc1|pc1||1|179
namet2|1582608505|pc1|pc1||1|185
namet4|1581355640|ai5|ai5|bcu|180|298466
namet4|1582651641|pc2|pc2||233|298670
namet5|1582406860|ai6|ai6|bcu|179|298977
namet5|1580563661|pc2|pc2||233|299406
namet6|1581283626|qe1|q0/1|Link to btse1/3|51|299990
namet7|1581643672|ai5|ai5|bcu|180|300046
namet4|1581758842|ai6|ai6|bcu|179|300061
namet6|1581298027|qe2|q0/2|Link to btse|52|300064
namet1|1582680415|pc2|pc2||233|300461
namet6|1581744427|pc3|p90|Link to btsi3a4|55|6215663
namet6|1581730026|pc3|p90|Link to btsi3a4|55|6573348
namet6|1582190826|qe2|q0/2|Link to btse|52|6706378
namet6|1582190826|qe1|q0/1|Link to btse1/3|51|6788568
namet1|1581974815|pc2|pc2||233|6895836
namet4|1581974841|pc2|pc2||233|7874504
namet6|1582176427|qe1|q0/1|Link to btse1/3|51|9497687
namet6|1582176427|qe2|q0/2|Link to btse|52|9529133
namet7|1581974872|pc2|pc2||233|9573450
namet6|1582162027|pc3|p90|Link to btsi3a4|55|9819491
namet6|1582190826|pc3|p90|Link to btsi3a4|55|13494946
namet6|1582176427|pc3|p90|Link to btsi3a4|55|19026820

Results I am getting:结果我得到:

Big data:大数据:大数据结果

Small data:小数据:小数据结果

Updated Graph Updated-graph更新图更新图

First of all, some improvements on your post: you are missing the import statements首先,对您的帖子进行一些改进:您缺少导入语句

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import ticker
import seaborn as sns

The line线

df = pd.DataFrame(data_file)

is not necessary, since data_file already is a DataFrame. The lines没有必要,因为data_file已经是 DataFrame。这些行

points = ['o', 'v', '^', '<', '>', '8', 's', 'p', 'H', 'D', 'd', 'P', 'X']
mult = len(df['name']) // len(points) + (len(df['name']) % len(points) > 0)
markers = {key:value for (key, value)
           in zip(df['name'], points * mult)}

do not cycle through points as you might expect, maybe use itertools as suggested here .不要像您期望的那样循环遍历points ,可以按照此处的建议使用itertools Also, setting yticks like另外,像这样设置 yticks

ax.yaxis.set_major_locator(ticker.MultipleLocator(100))

for every 100 might be too much if your data is spanning values from 0 to 20 million, consider replacing 100 with, say, 1000000.如果您的数据跨越 0 到 2000 万的值,则每 100 个可能太多,请考虑将 100 替换为 1000000。

I was able to reproduce your first problem.我能够重现你的第一个问题。 Using df.dtypes I found that the column counter was stored as type object .使用df.dtypes我发现列counter存储为类型object Adding the line添加行

df['counter']=df['counter'].astype(int)

resolved your first problem for me.为我解决了你的第一个问题。 I couldn't reproduce your second issue, though.不过,我无法重现您的第二个问题。 Here is what the resulting plot looks like for me:这是生成的 plot 对我来说的样子: 在此处输入图像描述 Have you tried updating all your packages to the latest version?您是否尝试过将所有软件包更新到最新版本?


EDIT: as follow up on your comment, you can also adjust the number of xticks in your plot by replacing 1 in编辑:根据您的评论,您还可以通过将 1 替换为 plot 来调整 xticks 的数量

ax.xaxis.set_major_locator(ticker.MultipleLocator(1))

by a higher number, say 10. Incorporating all my suggestions and deleting the seemingly unnecessary function definition, my version of your code looks as follows:更高的数字,比如 10。结合我的所有建议并删除看似不必要的 function 定义,我的代码版本如下所示:

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import ticker
import seaborn as sns
import itertools

fig = plt.figure()
ax  = fig.add_subplot()

df = pd.read_csv(
    'data.csv',
    header          = None,
    error_bad_lines = False,
    delimiter       = "|",
    index_col       = False,
    dtype           = 'unicode')
df.rename(columns={0: 'name',
                   1: 'date',
                   2: 'name3',
                   3: 'name4',
                   4: 'name5',
                   5: 'ID',
                   6: 'counter'}, inplace=True)

df.date = pd.to_datetime(df['date'], unit='s')
df['counter'] = df['counter'].astype(int)

points  = ['o', 'v', '^', '<', '>', '8', 's', 'p', 'H', 'D', 'd', 'P', 'X']
markers = itertools.cycle(points) 
markers = list(itertools.islice(markers, len(df['name'].unique())))

sc = sns.scatterplot(
    data    = df,
    x       = 'date',
    y       = 'counter',
    hue     = 'name',
    style   = 'name',
    markers = markers,
    s       = 50)           

ax.set_title("TEST", size = 12, zorder=0)             
ax.legend(
    title          = "Names",
    loc            = 'center left',
    shadow         = True,
    edgecolor      = 'grey',
    handletextpad  = 0.1,
    bbox_to_anchor = (1, 0.5))             
           
ax.xaxis.set_major_locator(ticker.MultipleLocator(10))
ax.yaxis.set_major_locator(ticker.MultipleLocator(1000000))             
ax.minorticks_off()
      
ax.set_xlabel("Dates", fontsize = 12, labelpad = 7)
ax.set_ylabel("Counter", fontsize = 12)
ax.grid(axis='y', color='0.95')

fig.autofmt_xdate(rotation = 30)  
plt.gcf().subplots_adjust(bottom=0.15)   
plt.show()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM