简体   繁体   English

如何用 100 万点散点图

[英]How to scatter plot with 1 million points

I'm trying to make a program that draws a graph using given points from a csv file which contains 4 strings for each row (Number of the point, x pos, y pos, color), but the time it takes is ridiculously high, so i'm looking for ideas to make it faster.我正在尝试制作一个程序,该程序使用 csv 文件中的给定点绘制图形,该文件每行包含 4 个字符串(点数、x pos、y pos、颜色),但所花费的时间高得离谱,所以我正在寻找可以让它更快的想法。

from matplotlib import pyplot as plt    
from matplotlib import style   
import csv

style.use('ggplot')

s = 0.5
with open('total.csv') as f:
  f_reader = csv.reader(f, delimiter=',')
  for row in f_reader:
    plt.scatter(str(row[1]), str(row[2]), color=str(row[3]), s=s)
plt.savefig("graph.png", dpi=1000)

The first step would be to call scatter once instead of for every points, without adding a dependency on numpy and pandas it could look like:第一步是调用scatter一次而不是每个点,而不添加对 numpy 和 pandas 的依赖,它可能看起来像:

from matplotlib import pyplot as plt
from matplotlib import style
import csv

style.use("ggplot")

s = 0.5
x = []
y = []
c = []
with open("total.csv") as f:
    f_reader = csv.reader(f, delimiter=",")
    for row in f_reader:
        x.append(row[1])
        y.append(row[2])
        c.append(row[3])
plt.scatter(x, y, color=c, s=s)
plt.savefig("graph.png", dpi=1000)

Then maybe try pandas.read_csv which would give you an pandas dataframe allowing you to access the columns of your CSV without a for loop, which would probably be faster.然后也许可以尝试pandas.read_csv ,它会给你一个pandas.read_csv数据pandas.read_csv ,允许你在没有for循环的情况下访问你的 CSV 的列,这可能会更快。

Each time you try a variation, measure the time it take (possibly on a smaller file) to know what help and what don't, in other words, don't try to enhance perfs blindly.每次尝试变体时,都要衡量(可能在较小的文件中)了解哪些有帮助,哪些没有帮助所花费的时间,换句话说,不要盲目地尝试增强性能。

Using pandas it would look like:使用熊猫它看起来像:

from matplotlib import pyplot as plt
from matplotlib import style
import pandas as pd

style.use("ggplot")

total = pd.read_csv("total.csv")
plt.scatter(total.x, total.y, color=total.color, s=0.5)
plt.savefig("graph.png", dpi=1000)

If you want to learn more on pandas good practices for performance, I like the No more sad pandas talk, take a look at it.如果你想了解更多关于 Pandas 性能的良好实践,我喜欢No more sad pandas talk,看看它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM