[英]After counting occurrences, how to make entries for each day in list
I have two columns of dates in two separate csv files. 我在两个单独的csv文件中有两列日期。 I am reading them into python, and plan to plot in matplotlib 我正在将它们读入python,并计划在matplotlib中进行绘制
One is for invoices: 一种是发票:
5/1/2015
5/1/2015
5/1/2015
5/2/2015
5/2/2015
5/2/2015
5/2/2015
5/3/2015
5/3/2015
5/3/2015
5/3/2015
5/4/2015
5/4/2015
5/4/2015
5/4/2015
5/4/2015
5/4/2015
5/5/2015
5/5/2015
5/5/2015
5/5/2015
5/7/2015
And the other is for disputes: 另一个是关于纠纷的:
5/1/2015
5/1/2015
5/2/2015
5/2/2015
5/3/2015
5/5/2015
5/5/2015
I want to make a list with the number of disputes divided by the number of invoices per day. 我要列出一份争议数量除以每天发票数量的清单。 So for May 1, 2015 the percent is 2/3. 因此,对于2015年5月1日,百分比为2/3。 For May 2, 2015 2/4. 2015年5月2日2/4。 May 3, 2015 is 1/4. 2015年5月3日是1/4。 May 4, 2015 is 0. May 5, 2015 is 2/4. 2015年5月4日为0。2015年5月5日为2/4。 There are no invoices or disputes on May 6, 2015 so the percent should be zero. 2015年5月6日没有发票或争议,因此百分比应为零。 Thus the list should be [.66, .5, .25, 0, .5, 0, 0] 因此,列表应为[.66,.5,.25、0,.5、0、0]
Then I am going to graph the percents on the y axis and the date as the x-axis. 然后,我将在y轴上绘制百分比,将日期作为x轴绘制。
I have tried to use df.index.day but then I get groups for the first day of each month, the second, etc. I was using value_counts to count up the occurrences of each date and then dividing between two lists but I was missing numbers where I didn't have invoices or disputes, and I want there to be a value for every day. 我尝试使用df.index.day,但是随后我得到了每个月的第一天,第二天等的组。我正在使用value_counts来计算每个日期的出现次数,然后在两个列表之间进行划分,但是我丢失了没有发票或争议的数字,我希望每天都有价值。
Does anyone know a simple way to do this? 有人知道这样做的简单方法吗?
One easy way to get the counts is to use collections.Counter
: 一种简单的获取计数的方法是使用collections.Counter
:
from collections import Counter
with open('invoice_dates') as f:
invoice_count = Counter(line.strip() for line in f)
and similarly for dispute_count
. 并且类似的dispute_count
。 You can then get a dictionary mapping dates to dispute percentages by 然后,您可以获取一个字典,将日期映射到争议百分比,
from __future__ import division # in case you are on Python 2.x
dispute_percentage = {date: dispute_count.get(date, 0) / invoices
for date, invoices in invoice_count.items()}
Use iteritems()
instead of items()
in the last line if you are on Python 2.x. 如果您使用的是Python 2.x,请在最后一行使用iteritems()
而不是items()
。
You should have made it more clear that you were using pandas
-- there are built-in tools to aid you to do what you want to do. 您应该更清楚地说明您正在使用pandas
-有内置的工具可以帮助您完成想要做的事情。 In this case, you can use value_counts
on your f
and f2
. 在这种情况下,可以在f
和f2
上使用value_counts
。 With your example data: 用您的示例数据:
>>> f = pd.to_datetime(f)
>>> f2 = pd.to_datetime(f2)
>>> f.value_counts()/f2.value_counts()
2015-05-01 0.666667
2015-05-02 0.500000
2015-05-03 0.250000
2015-05-04 NaN
2015-05-05 0.500000
2015-05-07 NaN
dtype: float64
>>> (f.value_counts()/f2.value_counts()).fillna(0.0)
2015-05-01 0.666667
2015-05-02 0.500000
2015-05-03 0.250000
2015-05-04 0.000000
2015-05-05 0.500000
2015-05-07 0.000000
dtype: float64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.