简体   繁体   中英

How do i make a graph/diagram from a CSV file in Python?

This is my first time asking a question in this forum, hopefully i won't make a fool of myself. I am a student in an IT education and i was briefly introduced to the CSV and Matplotlib libraries today. My teacher gave me a CSV file to play around with, along with some assignments. One of the assignments were to make a graph/diagram of the maximum and minimum temperatures and the corresponding dates in the CSV file. I need the row numbers and i need the program to understand the right format/syntax of the cells, but i am really not sure how to. Snippet of CSV file here This is what i got:

import csv
import matplotlib.pyplot as plt

filename = 'death_valley_2018_simple.csv'
with open(filename) as f:
    csv_reader = csv.reader(f, delimiter=',')
    line_count = 0

    for row in f:
        x=(row[4], row[5])
        y=(row[2])
        print(row[2])
        print(row[4])
        print(row[5])

plt.bar(x,y)
plt.xticks(y)
plt.ylabel('Dates')
plt.title('Plot')
plt.show()

the result is this "bar graph" I read other forum posts from here, asked around on Discord and read the documentation for CSV. Maybe the answer is there, but i don't understand it then. There is 365 lines in the file, so maybe it would be nice to limit the program to taking maybe the first 10 lines, instead of the whole file, but i'm not sure how to do that either. I hope someone will explain this to me like im 5 years old.

Personal Advice

Don't worry; I got you. But first some advice. I remember when I posted my first question on this forum, I didn't know the proper way to ask a question (and my English wasn't that good at that time). The key to asking a perfect question is to search first (which you did), and then if you didn't find an answer, you should ask your question as clear as possible and as short as possible. I'm not saying don't give enough information, but if you can ask your question in fewer words and your question is still as clear as possible, you should do it. Why? Because the truth is so many people will skip the question if it is long. Just now, when I opened your question and saw the lines, I was a little intimidated and wanted to skip it:D, but I solved it in a few minutes, and it wasn't scary at all. I am less concerned about writing long answers because those with a problem will read your answer if they have to. Please note that all of this was just my personal experience. You should also look for better beginner guides to ask questions on this forum and similar platforms. My suggestion: http://www.catb.org/~esr/faqs/smart-questions.html

Now the Answer

Instead of the csv library, which is a Python standard library (means it's part of the programming language when you install it and doesn't need to be installed separated), I prefer using pandas . pandas will make your life much more easier. But you have to install it first:

pip install pandas

Now it's quite simple, let's import everything and load the csv file.

import pandas as pd
import matplotlib.pyplot as plt

filename = 'death_valley_2018_simple.csv'
dataframe = pd.read_csv(filename)

dataframe contains your csv file's rows and columns. You can simply plot the minimum and maximum temperature corresponding to each date:

plt.plot(dataframe["DATE"], dataframe["TMAX"])
plt.plot(dataframe["DATE"], dataframe["TMIN"])

But it's not going to look pretty cause the DATE column is recognized just as a string, so matplotlib will show every single one of the dates. It can't recognize that this field is a time series. We need to change this column to datetime .

dataframe["DATE"] = pd.to_datetime(dataframe['DATE'], format="%Y-%m-%d")

So we are just telling pandas to change the DATE column to datetime , and we are telling where is the number for year and month and day is by specifying the format field. %Y represents the year, then there is a dash, %m represents the month, and..., we are using capital Y because %y represents the year when we only have the two digits on the right. In this case, since it is pretty straight forward, pandas will understand how to convert this column to datetime even if we didn't specify the format.

Now we just have to plot our diagram/graph just like before:

plt.plot(dataframe["DATE"], dataframe["TMAX"])
plt.plot(dataframe["DATE"], dataframe["TMIN"])

So after doing everything, your code should look like this:

import pandas as pd
import matplotlib.pyplot as plt

filename = 'death_valley_2018_simple.csv'
dataframe = pd.read_csv(filename)

dataframe["DATE"] = pd.to_datetime(dataframe['DATE'], format="%Y-%m-%d")

plt.plot(dataframe["DATE"], dataframe["TMAX"])
plt.plot(dataframe["DATE"], dataframe["TMIN"])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM