简体   繁体   中英

Python - How to create a nested dictionary from sqlite3 columns and graph it using Matplotlib?

I am trying to create a nested dictionary from sqlite3 columns from a database I created based on the anime I've watched (which is several hundred entries long). Two of the columns in the database are "DateWatched" which is the date I watched that particular anime (such as Jun 6-Jun 8, etc) and the other column is "Year" which is the year I watched that anime.

Here is a small example of the data in the two columns:

      DateWatched                | Year
---------------------------------+----------------
Dec 18-Dec 23                    | 2013
Dec 25-Jan 10                    | 2013 and 2014
Feb 2014 and Jan 1-Jan 3 2016    | 2014 and 2016   #Some anime get another season years later so any date after an "and" is another season
Mar 10th                         | 2014
Mar 13th                         | 2014

This is the basic structure of my two columns. What I want to do is store it in a dictionary or list and keep track of how many anime I watched each month (from Jan to Dec) for each year.

I think I want it to be something like this (based on my example):

Final = {'2013':{'Dec':2},
         '2014':{'Jan':1, 'Feb':1,'Mar':2}
         '2016':{'Jan':1}}

I figured out how to create a list of each column individually:

MonthColumn = [i[0] for i in c.execute("SELECT DateWatched FROM Anime").fetchall()]  #'Anime' is just the name of arbitrary name for the database
x = [item.replace('-',' ') for item in [y for x in MonthColumn for y in re.split(' and ', x)]]  #Gets rid of '-' in each row and splits into two strings any place with an 'and'
v = [' '.join(OrderedDict((w,w) for w in item.split()).keys()) for item in x]  # Removes duplicate words ("Dec 18-Dec 23" becomes "Dec 18 23")
j = [y for j in v for y in j.split()]  #Splits into separate strings ("Dec 18 23" becomes "Dec", "18", "23")
Month = [item for item in j if item.isalpha()] #Final list and removes any string with numbers (So "Dec","18","23" becomes "Dec")

YearColumn = [i[0] for i in c.execute("SELECT Year FROM Anime").fetchall()]
Year = [item for Year in YearColumn for item in re.split(' and ', Year)]  #Final list and removes any "and" and splits the string into 2 (So "2013 and 2014" becomes "2013","2014")

#So in the example columns I gave above, my final lists become
Month = ['Dec','Dec','Jan','Feb','Jan','Mar','Mar']
Year =  ['2013','2013','2014','2014','2016','2014',2014']

The biggest problem and where I need the most help is trying to figure out how to convert the two lists into nested dictionary or something similar and use that in Matplotlib to create a bar chart with the year as the x-axis (with 12 bars for each year) and the y-axis being the number of anime watched that month for each year on the x-axis.

Thank you for your help and sorry if I missed anything or didn't include something (First time posting).

I suggest utilizing a slightly different parsing method to deal with month-to-day ranges, which need to be taken into account to achieve your desired dictionary for visualization, which can then be used to create a clearer plot:

import re, sqlite3 
import itertools, collections
data = list(sqlite3.connect('db_tablename.db').cursor().execute("SELECT  DateWatched, Year FROM tablename"))
new_parsed = [[list(filter(lambda x:x != 'and', re.findall('[a-zA-Z]+', a))), re.findall('\d+', b)] for a, b in data]
new_results = [i for b in [list(zip(*i)) for i in new_parsed] for i in b]
groups = {a:collections.Counter([c for c, _ in b]) for a, b in itertools.groupby(sorted(new_results, key=lambda x:x[-1]), key=lambda x:x[-1])}

This gives a result of {'2013': Counter({'Dec': 2}), '2014': Counter({'Mar': 2, 'Jan': 1, 'Feb': 1}), '2016': Counter({'Jan': 1})} .

To graph:

import matplotlib.pyplot as plt
months = ['Dec', 'Jan', 'Feb', 'Mar']
new_months = {a:[[i, b.get(i, 0)] for i in months] for a, b in groups.items()}
labels = iter(['Dec', 'Jan', 'Feb', 'Mar'][::-1])
for i in range(len(new_months['2013'])):
  i = len(new_months['2013'])-i-1
  _current = [b[i][-1] for _, b in sorted(new_months.items(), key=lambda x:int(x[0]))]
  _previous = [sum(c[-1] for c in b[:-i]) for _, b in sorted(new_months.items(), key=lambda x:int(x[0]))]
  if not all(_previous):
     plt.bar(range(len(new_months)), _current, label = next(labels))
  else:
     plt.bar(range(len(new_months)), _current, label = next(labels), bottom = _previous)

plt.xticks(range(len(new_months)), sorted(new_months, key=lambda x:int(x)))
plt.legend(loc='upper left')
plt.show()

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM