计算发生次数后，如何在列表中的每一天进行输入

Question

I have two columns of dates in two separate csv files. 我在两个单独的csv文件中有两列日期。 I am reading them into python, and plan to plot in matplotlib 我正在将它们读入python，并计划在matplotlib中进行绘制

One is for invoices: 一种是发票：

5/1/2015
5/1/2015
5/1/2015
5/2/2015
5/2/2015
5/2/2015
5/2/2015
5/3/2015
5/3/2015
5/3/2015
5/3/2015
5/4/2015
5/4/2015
5/4/2015
5/4/2015
5/4/2015
5/4/2015
5/5/2015
5/5/2015
5/5/2015
5/5/2015
5/7/2015

And the other is for disputes: 另一个是关于纠纷的：

5/1/2015
5/1/2015
5/2/2015
5/2/2015
5/3/2015
5/5/2015
5/5/2015

I want to make a list with the number of disputes divided by the number of invoices per day. 我要列出一份争议数量除以每天发票数量的清单。 So for May 1, 2015 the percent is 2/3. 因此，对于2015年5月1日，百分比为2/3。 For May 2, 2015 2/4. 2015年5月2日2/4。 May 3, 2015 is 1/4. 2015年5月3日是1/4。 May 4, 2015 is 0. May 5, 2015 is 2/4. 2015年5月4日为0。2015年5月5日为2/4。 There are no invoices or disputes on May 6, 2015 so the percent should be zero. 2015年5月6日没有发票或争议，因此百分比应为零。 Thus the list should be [.66, .5, .25, 0, .5, 0, 0] 因此，列表应为[.66，.5，.25、0，.5、0、0]

Then I am going to graph the percents on the y axis and the date as the x-axis. 然后，我将在y轴上绘制百分比，将日期作为x轴绘制。

I have tried to use df.index.day but then I get groups for the first day of each month, the second, etc. I was using value_counts to count up the occurrences of each date and then dividing between two lists but I was missing numbers where I didn't have invoices or disputes, and I want there to be a value for every day. 我尝试使用df.index.day，但是随后我得到了每个月的第一天，第二天等的组。我正在使用value_counts来计算每个日期的出现次数，然后在两个列表之间进行划分，但是我丢失了没有发票或争议的数字，我希望每天都有价值。

Does anyone know a simple way to do this? 有人知道这样做的简单方法吗？

Answer 1

One easy way to get the counts is to use collections.Counter : 一种简单的获取计数的方法是使用collections.Counter ：

from collections import Counter
with open('invoice_dates') as f:
    invoice_count = Counter(line.strip() for line in f)

and similarly for dispute_count . 并且类似的dispute_count 。 You can then get a dictionary mapping dates to dispute percentages by 然后，您可以获取一个字典，将日期映射到争议百分比，

from __future__ import division # in case you are on Python 2.x
dispute_percentage = {date: dispute_count.get(date, 0) / invoices
                      for date, invoices in invoice_count.items()}

Use iteritems() instead of items() in the last line if you are on Python 2.x. 如果您使用的是Python 2.x，请在最后一行使用iteritems()而不是items() 。

Answer 2

You should have made it more clear that you were using pandas -- there are built-in tools to aid you to do what you want to do. 您应该更清楚地说明您正在使用pandas -有内置的工具可以帮助您完成想要做的事情。 In this case, you can use value_counts on your f and f2 . 在这种情况下，可以在f和f2上使用value_counts 。 With your example data: 用您的示例数据：

>>> f = pd.to_datetime(f)
>>> f2 = pd.to_datetime(f2)
>>> f.value_counts()/f2.value_counts()
2015-05-01    0.666667
2015-05-02    0.500000
2015-05-03    0.250000
2015-05-04         NaN
2015-05-05    0.500000
2015-05-07         NaN
dtype: float64
>>> (f.value_counts()/f2.value_counts()).fillna(0.0)
2015-05-01    0.666667
2015-05-02    0.500000
2015-05-03    0.250000
2015-05-04    0.000000
2015-05-05    0.500000
2015-05-07    0.000000
dtype: float64

计算发生次数后，如何在列表中的每一天进行输入

问题描述

2 个解决方案

解决方案1
1 已采纳 2015-08-18 20:06:06

解决方案2
1 2015-08-18 21:04:18

计算发生次数后，如何在列表中的每一天进行输入

问题描述

2 个解决方案

解决方案1 1 已采纳 2015-08-18 20:06:06

解决方案2 1 2015-08-18 21:04:18

解决方案1
1 已采纳 2015-08-18 20:06:06

解决方案2
1 2015-08-18 21:04:18