简体   繁体   English

使用CSV文件问题中的matplotlib绘制散点图

[英]Plotting scatter points with matplotlib from a CSV file question

I have extracted some data from a website into a CSV file and I need to plot a scatterplot in matplotlib from that CSV file. 我已经从网站中提取了一些数据到CSV文件中,并且需要从该CSV文件中在matplotlib中绘制散点图。 I only need column 2 and 3 data from the CSV file. 我只需要CSV文件中的第2列和第3列数据。

I'm trying to use a for loop to gather CSV data into a list and then use that to plot the scatterplot but I'm getting a "ValueError: x and y must be the same size" error. 我试图使用一个for循环将CSV数据收集到一个列表中,然后使用它来绘制散点图,但出现“ ValueError:x和y必须为相同大小”错误。

import matplotlib.pyplot as plt
import csv

with open(cache_path + distance_csv) as csv_file:
reader = csv.reader(csv_file)

for column in reader:
    city_distance = [x[1] for x in csv.reader(csv_file)]
    crime_rate = [x[2] for x in csv.reader(csv_file)]

    plt.scatter(city_distance, crime_rate)
    plt.show()

Both columns 2 and 3 in my CSV file are the same length - 83 cells yet I am getting a ValueError. CSV文件中的第2列和第3列的长度相同-83个单元格,但出现ValueError。 What am I missing here? 我在这里想念什么?

You have some bugs in your code, I can't know which of them is causing your behaviour but after you fix them all you could progress: 您的代码中有一些错误,我不知道是哪个错误导致了您的行为,但是在修复所有错误之后,您可以继续进行:

  1. First, look how you read the columns. 首先,看一下如何阅读各列。 Notice that you iterate over column in reader but never use column (you create two new csv.readers in every iteration in the for loop). 请注意,您要遍历reader中的列,但不要使用column (在for循环的每次迭代中创建两个新的csv.readers)。 Look at a possible solution afterwards in this answer. 之后在此答案中查看可能的解决方案。
  2. Regarding to that, you're reading csv_file outside the scope of the 'with' statement so the file would be already closed. 关于这一点,您正在读取“ with”语句范围之外的csv_file ,因此该文件已被关闭。 If you'll use the for loop and column you won't have to fix this issue anyway. 如果您将使用for循环和 ,则无论如何都不必解决此问题。
  3. You're plotting in every iteration (thus you'll create 83 plots and I guess you don't want that). 您将在每次迭代中绘图(因此您将创建83个绘图,但我想您不想要这样做)。

So a possible solution would be: 因此,可能的解决方案是:

import matplotlib.pyplot as plt
import csv

with open(cache_path + distance_csv) as csv_file:
  reader = csv.reader(csv_file)
city_distance, crime_rate  = [], []
for column in reader:
  city_distance.append(float(column[1]))
  crime_rate.append(float(column[2]))
plt.scatter(city_distance, crime_rate)
plt.show()

For future, I'll recommend that you try to verify that len(city_distance)==len(crime_rate) . 为了将来,我建议您尝试验证len(city_distance)==len(crime_rate) I mean, check your data not in the csv but rather in the code, after reading the values and right before the error - to have the most usable information to proceed. 我的意思是,在读取值之后并在错误之前,不检查csv中的数据,而是检查代码中的数据-以获取最有用的信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM