简体   繁体   English

带有时间戳记的SQL导出中按行业分组的按月划分的Pandas线图

[英]Pandas Line Graph by Month, Grouped by Industry from Timestamped SQL Export

Newbie question, thank you in advance! 新手问题,谢谢您!

I'm trying to group the data by both date and industry and display a chart that shows the different industry revenue numbers across the time series in monthly increments. 我正在尝试按日期和行业对数据进行分组,并显示一个图表,以每月增量显示整个时间序列中不同的行业收入数字。

I am working from a SQL export that has timestamps, having a bear of time getting this to work. 我正在从具有时间戳记的SQL导出中进行工作,没有时间让它起作用。

Posted sample csv data file here: https://drive.google.com/open?id=0B4xdnV0LFZI1WGRMN3AyU2JERVU 将示例csv数据文件发布到此处: https//drive.google.com/open? id = 0B4xdnV0LFZI1WGRMN3AyU2JERVU

Here's a small data example: 这是一个小数据示例:

Industry     Date                Revenue
Fast Food   01-05-2016 12:18:02  100
Fine Dining 01-08-2016 09:17:48  110
Carnivals   01-18-2016 10:48:52  200

My failed attempt is here: 我失败的尝试在这里:

import pandas as pd
import datetime
import matplotlib.pyplot as plt

df = pd.read_csv('2012_to_12_27_2016.csv')

df['Ship_Date'] = pd.to_datetime(df['Ship_Date'], errors = 'coerce')
df['Year'] =  df.Ship_Date.dt.year
df['Ship_Date'] =  pd.DatetimeIndex(df.Ship_Date).normalize()
df.index = df['Ship_Date']
df_skinny = df[['Shipment_Piece_Revenue', 'Industry']]

groups = df_skinny[['Shipment_Piece_Revenue', 'Industry']].groupby('Industry')
groups = groups.resample('M').sum()
groups.index = df['Ship_Date']

fig, ax = plt.subplots()
groups.plot(ax=ax, legend=False)
names = [item[0] for item in groups]
ax.legend(ax.lines, names, loc='best')

plt.show()

You could use DataFrame.Series.unique to get a list of all industries and then, using DataFrame.loc , define a new DataFrame object that only contains data from a single Industry. 您可以使用DataFrame.Series.unique获取所有行业的列表,然后使用DataFrame.loc定义一个仅包含来自单个行业的数据的新DataFrame对象。

Then if we set the Ship Date column as the index of the new DataFrame , we can use DataFrame.resample , specify the frequency as months and call sum() to get the total revenue for that month. 然后,如果我们将Ship Date列设置为新DataFrame的索引,则可以使用DataFrame.resample ,将频率指定为月,然后调用sum()以获取该月的总收入。

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('Graph_Sample_Data.csv')
df['Ship Date'] = pd.to_datetime(df['Ship Date'], errors='coerce')

fig, ax = plt.subplots()

for industry in df.Industry.unique():
    industry_df = df.loc[df.Industry == industry]
    industry_df.index = industry_df['Ship Date']
    industry_df = industry_df.resample('M').sum()
    industry_df.plot(x=industry_df.index, 
                     y='Revenue', 
                     ax=ax,
                     label=industry)

plt.show()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM