简体   繁体   中英

How do I plot my data using matplotlib in Python?

I've gotten the publication years of different books using this:

# -*- coding: utf-8 -*-
"""
Created on Fri Mar 22 13:12:11 2019

@author: Oppilas
"""
from __future__ import division
from matplotlib import pyplot as plt
from collections import Counter

import pandas as pd
import numpy as np
import re
import math


file = "BL-Flickr-Images-Book.csv"
df = pd.read_csv(file)
cnt = 0

for row in df['Date of Publication']:
    try:
        row += 0
    except TypeError:
        try:
            new_value = int(row)
            df.loc[cnt,'Date of Publication'] = new_value
        except ValueError:
            new_row = re.sub("\D","",row)
            df.loc[cnt,'Date of Publication'] = int(new_row[:4])
    cnt += 1


pub_years = []

for year in df['Date of Publication']:
    if math.isnan(year):
        continue
    else:
        if len(str(year)) >= 4:
            pub_years.append(year)

So, how do I plot this data sensibly using matplotlib? I've tried pyplot, but the graph line was all over the place. I also tried to look at the documentation for hist, but couldn't get it working.

Is the data I've extracted poor, or is it my lack of skill with matplotlib?

In general, you almost never need to iterate over rows to process your dataframe. You can just work on the columns directly. For example, this should work:

df.groupby('Year').count().plot(marker='o')

If you have some mangled dates, eg numbers like 61, 62, 63, etc instead of 1961, 1962, 1963, etc, then perhaps you can filter them out:

df.loc[df['Year']<100, 'Year'] = df['Year'] + 1900

I changed your column df['Date of publication'] to df['Year'] to make the examples a bit easier to read.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM